Write a Javascript parser from scratch

Recently studying AST, a previous article interviewer: Have you heard about Babel? Have you ever written a Babel plugin? A: No. Why should the pawn know about it? Because the AST really can do whatever it wants

To put it simply, use Javascript to run Javascript code.

This article will show you how to write a simple parser.

Foreword (skip if you know how to execute custom JS code)

How many ways do you know to execute custom scripts? Let’s list them:

Web

Create the script script and insert the document stream

function runJavascriptCode(code) {
  const script = document.createElement("script");
  script.innerText = code;
  document.body.appendChild(script);
}

runJavascriptCode("alert('hello world')");
Copy the code

eval

Countless people have been saying don’t use Eval, even though it can execute custom scripts

eval("alert('hello world')");
Copy the code

Why is using the JavaScript eval function a bad idea?

setTimeout

SetTimeout does the same thing, but it pushes the action to the next event loop

setTimeout("console.log('hello world')");
console.log("I should run first");

/ / output
// I should run first
// hello world'
Copy the code

new Function

new Function("alert('hello world')") ();Copy the code

Are eval() and new Function() the same thing?

NodeJs

require

You can write Javascript code in a Js file and then require it in another file to achieve the effect of execution.

NodeJs caches modules, which can consume a lot of memory if you execute N such files. After the execution is complete, manually clear the cache.

Vm

const vm = require("vm");

const sandbox = {
  animal: "cat".count: 2
};

vm.runInNewContext('count += 1; name = "kitty"', sandbox);
Copy the code

None of the above, except that Node performs gracefully, depends on the host environment for the API.

Interpreter usage

Execute custom code on any platform that can execute Javascript code.

Small programs, for example, block the above ways of executing custom code

Is it really impossible to execute custom code?

The working principle of

Based on the AST(abstract syntax tree), find the corresponding object/method, and then execute the corresponding expression.

Console. log(“hello world”);

How it works: Use the AST to find the console object, find its log function, and run the function with the hello world parameter

The preparation of the instruments

Babylon, for parsing code, generates AST
Babel-types: determines the node type
Astexplorer, view the abstract syntax tree at any time

Start code

Let’s run console.log(” Hello world”) as an example

Open AstExplorer to view the corresponding AST

As can be seen in the figure, to find console.log(” Hello world”), we have to traverse down the node mode, through the File, Program, ExpressionStatement, CallExpression, MemberExpression nodes, There are Identifier, StringLiteral nodes involved

We begin by defining visitors, which are what to do with different nodes

const visitors = {
  File(){},
  Program(){},
  ExpressionStatement(){},
  CallExpression(){},
  MemberExpression(){},
  Identifier(){},
  StringLiteral(){}
};
Copy the code

Define another function that traverses the node

/** * Traversing a Node * @param {Node} Node object * @param {*} scope */
function evaluate(node, scope) {
  const _evalute = visitors[node.type];
  // If there is no handler for this node, throw an error
  if(! _evalute) {throw new Error(`Unknown visitors of ${node.type}`);
  }
  // Execute the handler function corresponding to this node
  return _evalute(node, scope);
}
Copy the code

The following is a processing implementation for each node

const babylon = require("babylon");
const types = require("babel-types");

const visitors = {
  File(node, scope) {
    evaluate(node.program, scope);
  },
  Program(program, scope) {
    for (const node of program.body) {
      evaluate(node, scope);
    }
  },
  ExpressionStatement(node, scope) {
    return evaluate(node.expression, scope);
  },
  CallExpression(node, scope) {
    // Get the caller object
    const func = evaluate(node.callee, scope);

    // Get the parameters of the function
    const funcArguments = node.arguments.map(arg= > evaluate(arg, scope));

    // To get properties: console.log
    if (types.isMemberExpression(node.callee)) {
      const object = evaluate(node.callee.object, scope);
      return func.apply(object, funcArguments);
    }
  },
  MemberExpression(node, scope) {
    const { object, property } = node;

    // Find the corresponding attribute name
    const propertyName = property.name;

    // Find the corresponding object
    const obj = evaluate(object, scope);

    // Get the corresponding value
    const target = obj[propertyName];

    // Return this value. If this value is function, then the context this should be bound
    return typeof target === "function" ? target.bind(obj) : target;
  },
  Identifier(node, scope) {
    // Get the value of the variable
    return scope[node.name];
  },
  StringLiteral(node) {
    returnnode.value; }};function evaluate(node, scope) {
  const _evalute = visitors[node.type];
  if(! _evalute) {throw new Error(`Unknown visitors of ${node.type}`);
  }
  // Call recursively
  return _evalute(node, scope);
}

const code = "console.log('hello world')";

// Generate the AST tree
const ast = babylon.parse(code);

/ / parse the AST
// You need to pass in the execution context, otherwise the ' 'console' 'object cannot be found
evaluate(ast, { console: console });
Copy the code

Try it in Nodejs

$ node ./index.js
hello world
Copy the code

Const code = “console.log(math.pow (2, 2))”;

Because the context does not have a Math object, you get a TypeError: Cannot read property ‘pow’ of undefined

Remember to pass in the context evaluate(ast, {console, Math});

Error: Unknown visitors of NumericLiteral

Originally, the 2 in Math.pow(2, 2) was a numeric literal

The node is NumericLiteral, but in the Visitors we do not define what to do with this node.

So let’s add this node:

NumericLiteral(node){
    return node.value;
  }
Copy the code

If we run it again, we’ll get exactly what we expect

$ node ./index.js
4
Copy the code

At this point, you’ve implemented the most basic function calls

The advanced

Since it’s an interpreter, can it only run Hello World? Obviously not

Let’s declare a variable

var name = "hello world";
console.log(name);
Copy the code

So let’s look at the AST structure

The processing of VariableDeclaration and VariableDeclarator nodes is missing in visitors, so we add them

VariableDeclaration(node, scope) {
    const kind = node.kind;
    for (const declartor of node.declarations) {
      const {name} = declartor.id;
      const value = declartor.init
        ? evaluate(declartor.init, scope)
        : undefined;
      scope[name] = value;
    }
  },
  VariableDeclarator(node, scope) {
    scope[node.id.name] = evaluate(node.init, scope);
  }
Copy the code

So let’s run this code, and it prints out Hello World

Let’s declare the function again

function test() {
  var name = "hello world";
  console.log(name);
}
test();
Copy the code

Following the steps above, several new nodes were added

BlockStatement(block, scope) {
    for (const node of block.body) {
      // Execute the contents of the code block
      evaluate(node, scope);
    }
  },
  FunctionDeclaration(node, scope) {
    / / get the function
    const func = visitors.FunctionExpression(node, scope);

    // Define function in scope
    scope[node.id.name] = func;
  },
  FunctionExpression(node, scope) {
    // Create a function
    const func = function() {
      // TODO:Gets the parameters of the function
      // Execute the contents of the code block
      evaluate(node.body, scope);
    };

    // Return this function
    return func;
  }
Copy the code

Then modify CallExpression

// To get properties: console.log
if (types.isMemberExpression(node.callee)) {
  const object = evaluate(node.callee.object, scope);
  return func.apply(object, funcArguments);
} else if (types.isIdentifier(node.callee)) {
  / / new
  func.apply(scope, funcArguments); / / new
}
Copy the code

It can also run by printing hello World

Complete sample code

other

I don’t have the space to go into how to deal with all the nodes, but the basic principles are covered.

For other nodes, you can do the same, but note that I used a single scope. There are no parent/child scopes

This means that the code will work

var a = 1;
function test() {
  var b = 2;
}
test();
console.log(b); / / 2
Copy the code

When recursing through the AST tree, use the new scope, such as function, for in, and so on

The last

This is just a simple model, it’s not even a toy, it still has lots of pits. Such as:

For variable promotion, the scope should have a pre-parse phase
Scoping has many problems
Specific node, must be nested under a node. For example, super() must be inside the Class node, no matter how many layers are nested
This binding
.

After several nights of staying up late, I wrote a relatively complete library vm.js, modified from JSJS, standing on the shoulders of giants.

The difference is:

Refactoring recursion solves some unsolvable problems
Fixed a number of bugs
Added test cases
Support for ES6 and other syntactic sugars

It is currently in development and will release the first version when it is more complete.

Welcome the big guns to slap bricks and PR.

Small program into a large program in the future, business code through Websocket push to execute, small program source code is just a shell, think about all stimulus.

Project address: github.com/axetroy/vm….

Online preview: axetry.github. IO /vm.js/

Original text: axetroy. Xyz / # / post / 172