Indepth.dev /source-maps…

This article provides a basic overview and in-depth explanation of the mechanics of THE JS code generator and Source Maps. Building our own Source map from scratch makes this article long.

Source Maps is a mystery to most people. They can be found in most compilation schemes for the Web, from type systems to Web Bundlers. But often, the details of their actual construction are not 100% transparent, because their usage alone will be complex enough. Today we’ll give you a brief overview of what a Source Map is and how to use it. Then we move on to the underlying mechanism: by building our own compiler, which generates some code and generates its own Source map for the browser to use.

This is part of my Behind-the-scenes World series:

  • Type systems (such as TypeScript)
  • React hooks
  • Web Bundlers (e.g. Webpack)

The full video of the article is here. This is part of my “Behind the Scenes world” video series.

Here is a summary of the article:

Introduction to Source Maps and the compiler

  1. What is a Source map? Why is it useful?
  2. Use source Maps with popular tools
  3. What is AST?
  4. Steps to transform JavaScript
  5. How does the compiler build Source Maps

Build your own compiler

  1. Build a JavaScript code generator
  2. What is Base64 VLQ?
  3. Added Source Map support
  4. Test the Source map

Introduction to Source Maps and the compiler

What is a Source map? Why is it useful?

First let’s look at why people write JavaScript that needs to be ported to native JavaScript:

  • Use type system
  • Use the latest ES 8-9-10 features
  • Code optimization (e.g. compression)
  • Bundle optimization (e.g. vendor and APP bundling)

The modern compiler architecture looks like this:

This is where Source Maps comes from!

The basic definition of a Source map is:

“Source Map provides a way to map code in a compressed file back to its original location in the source file”

So the purpose is simple. Modern browsers automatically parse the Source Map to make it look like you’re running an uncompressed or merged file.

The following example shows a way to debug TypeScript in a browser, which can only be done with source Map.

You can now place a breakpoint in your code and check the call stack, variables, and any runtime state in the browser, all with pre-compiled TypeScript code.

2. Use source Maps with popular tools

There are two ways to notify the browser that a Source map is available.

  1. Add 

 to the footer of the JS fileCollege / / # sourceMappingURL = / path/to/file. Js. Map college
  2. Add 

 to the Header of the JS fileCollege X-ray SourceMap: / path/to/file. Js. Map college

A few points to note:

  • Chrome will only download source maps with DevTools open (because they can be large)
  • Source map does not appear in network requests (on the “Network” TAB)
  • Once you have the Source map, you can add breakpoints in the Source code (under the Source TAB).

The Source map standard

Source Maps must now follow the latest version of the Source Map specification. Version 3 can be found here. The specification was written primarily by Mozilla and Google engineers. The overall size has been improved in version 3, which will speed up its download and parsing.

Here is a source map example, with an emphasis on “Mappings”. These are Base64 VLQ strings that contain the actual mappings from source code to the generated code. We’ll make more of this ourselves later.

Usage in popular tools

Node.js

Go to tag-enable-source-maps

When an exception occurs, source maps are cached and used for stack tracing.

Babel

By default, Babel appends a Source Map location to the end of each generated bundle. Such as:

//# sourceMappingURL=file.map.js
Copy the code

The tag -source-maps-inline tells Babel to use an inline source map, as shown below (that is, the contents of a Base64 encoded string) :

//# sourceMappingURL=data:application/json; charset=utf-8; Base64,...
Copy the code
Webpack

Devtool: ‘source-map’

It’s worth noting that because tools like Webpack often use multiple processors to perform multiple transformations simultaneously (such as Babel and TypeScript), it can still generate a single source map. Each processor will generate its own Source map, but there are also libraries that can connect JavaScript files and merge source Maps files. Mapcat is an example.

What is AST?

Before we go any further, we need to take a quick look at one of the most important mechanisms in JavaScript compilers: AST.

AST stands for “Abstract Synctax Tree”, which is essentially a “node” Tree representing the code of a program. The “node” is the smallest unit, basically a POJO (that is, a regular wine JS object) with “type” and “location” attributes. All nodes have these two attributes, but they can have various other attributes, depending on their type.

In the AST format, code is easy to manipulate, such as adding, deleting, and even replacing.

Here is a sample code:

Will become the following AST:

Some sites, such as AstExplorer. Next, let you write JavaScript code and see an AST of it immediately.

Tree traversal

The most important part of working with an AST is understanding the different approaches, each of which has strengths and weaknesses.

A popular example (of the type we’ll use today) is “depth-first traversal,” which works by starting at the root and exploring as far left as possible in each branch before backtracking. Therefore, it will process a tree in the following order:

If we have a piece of code like:

2 + 3 * 1
Copy the code

The following tree will be generated:

Steps to transform JavaScript

There are three steps to converting JavaScript:

1) Parse the source code into an AST
  • Lexical analysis: Convert code strings to token streams (e.g. Array)
  • Syntax analysis: Converts the token stream to its AST representation
2) Transform the nodes on the AST

Manipulate AST nodes (any library plug-in can operate from here, such as Babel)

3) Generate source code

Convert the AST to a JavaScript source code string

Today we will focus on the work of generators.

Libraries differ in that they perform only one step, or all three steps.

Examples of libraries that implement all three steps:

  • Babel
  • Recast
  • Facebooks codemod

Examples of libraries that implement only one step:

  • Esprima (parsing)
  • Ast-types (with AN AST node operation)
  • Escodegen (generated)

How does the compiler build Source Maps

There are three parts to generate a source map, which all compilers must do:

1) Transform the code and note the location of the newly generated source 2) examine the location differences between the source and generated code 3) build a source map using these mappings

This is an oversimplified approach, and we’ll dig deeper in the next section.

Build our own compiler

1. Build a JavaScript code generator

We’ll start with the following architecture. The goal is to generate the converted file after compilation (index.es5.js), and source map (index.es5.js.map).

Our SRC /index.es6.js looks like this (a simple add-in function) :

function add(number) {
  return number + 1;
}

globalThis.add = add;
Copy the code

We now have the pre-compiled source code. We’re going to start looking at the compiler.

steps

The compiler must perform the following steps:

1. Parse the code into an AST

Since this article doesn’t focus on parsing, we’ll use a basic third-party tool (Esprima or EsCodeGen)

2. Add a shallow copy of each node to the AST

This idea is borrowed from Recast. The idea is that each node will keep itself and a copy of itself (the original node). The copy is used to check whether the node has changed. More on that later.

3. The conversion

We will do this manually. We can use libraries like ast-types or @babel/types because they have useful apis.

4. Generate source code

Convert the AST to JavaScript

5. Add Source Map support

Step 4 and step 5 are done simultaneously. This designs the traversal tree and detects where the AST node has changed with its “original” property. For those instances, the mapping between the “original” and “generated” codes is stored.

5. Write the build /

Finally, the generated source code and its Souce Map are written to the corresponding file.

code

Let’s go over the steps again, but this time in more detail.

1. The parsing code is AST

Using the first basic third-party tool (I used a simple one called AST), we took the file content and passed it to the library parser.

import fs from "fs";
import path from "path";
import ast from "abstract-syntax-tree";

const file = "./src/index.es6.js";
const fullPath = path.resolve(file);
const fileContents = fs.readFileSync(fullPath, "utf8");
const sourceAst = ast.parse(fileContents, { loc: true });
Copy the code
2. Add a shallow copy of each node to the AST

First, we define a function called “visit” whose job is to traverse the tree and perform the callback function on each node.

export function visit(ast, callback) {
  callback(ast);

  const keys = Object.keys(ast);
  for (let i = 0; i < keys.length; i++) {
    const keyName = keys[i];
    const child = ast[keyName];
    if (keyName === "loc") return;
    if (Array.isArray(child)) {
      for (let j = 0; j < child.length; j++) { visit(child[j], callback); }}else if(isNode(child)) { visit(child, callback); }}}function isNode(node) {
  return typeof node === "object" && node.type;
}
Copy the code

Here we do the “depth-first traversal” mentioned above. For a given node, it will:

  1. Implement the callback
  2. Check if it is a “LOC” property and return as soon as possible if it is
  3. Check for any properties that are part of the array, and if so, call visit for each child object
  4. Check any properties that belong to the AST node, and if so, use that node to call VISIT

Next we start cloning.

export const cloneOriginalOnAst = ast= > {
  visit(ast, node= > {
    const clone = Object.assign({}, node);
    node.original = clone;
  });
};

Copy the code

The cloneOriginalAst function generates a copy of the node and attaches it to the original node.

We clone using Object.assign, which is a shallow copy, and copy the top-level attributes. Nested properties are still wired by reference, that is, changing them changes the clone values. We can also use the extension operator here, which does the same thing. We will use the top layer for comparison, which is sufficient to compare two AST nodes and determine whether the nodes have changed.

In general, our code here will return the same tree, except with the “original” attribute on each node.

3. The conversion

Next, we will do the node operation. We’ll keep it simple, so just swap two nodes from our program. So we’re going to go from

number + 1
Copy the code

Change to:

1 + number
Copy the code

Simple in theory, right?

Here is the code for our swap:

// Swap: "number + 1"
// - clone left node
const leftClone = Object.assign(
  {},
  sourceAst.body[0].body.body[0].argument.left
);
// - replace left node with right node
sourceAst.body[0].body.body[0].argument.left =
  sourceAst.body[0].body.body[0].argument.right;
// - replace right node with left clone
sourceAst.body[0].body.body[0].argument.right = leftClone;
// Now: "1 + number". Note: loc is wrong 
Copy the code

We didn’t use a neat API to do this (many libraries provide it) because we manually swapped the two nodes.

Examples of libraries with apis are shown below, provided by the AST-types document.

This approach is certainly safer, easier to follow and faster to develop. So, in general, I recommend using it for any complex AST operation, as most well-known compilers do.

4. Generate source code

Code generators are typically located in a single file with several thousand lines of code. For example, esCodeGen’s compiler is 2619 lines (see here). This is small in comparison with the others.

I’ve used much of the same code for our compiler (because most generators need very similar logic to process the AST into JavaScript), except for the essential parts of the code in the index.es6.js file.

A) Node processor and character toolset

These are generic utility functions for handling AST nodes (depending on the type, for example, function declarations will have an identifier) and build source code. It also includes some common character constants (such as “space”). They are called in the code “Type statement” in the next section.

I won’t worry too much about the details here unless you plan to write a compiler. This is largely borrowed from escodeGen’s generator.

// Common characters
const space = "";
const indent = space + space;
const newline = "\n";
const semicolon = ";"; // USUALLY flags on this

// Utility functions
function parenthesize(text, current, should) {
  if (current < should) {
    return ["(", text, ")"];
  }
  return text;
}
const generateAssignment = (left, right, operator, precedence) = > {
  const expression = [
    generateExpression(left),
    space + operator + space,
    generateExpression(right)
  ];
  return parenthesize(expression, 1, precedence).flat(); // FLATTEN
};
const generateIdentifier = id= > {
  return id.name;
};
const generateFunctionParams = node= > {
  const result = [];
  result.push("(");
  result.push(node.params[0].name); // USUALLY lots of logic to grab param name
  result.push(")");
  return result;
};
const generateStatement = node= > {
  const result = Statements[node.type](node);
  return result;
};
const generateFunctionBody = node= > {
  const result = generateFunctionParams(node);
  return result.concat(generateStatement(node.body)); // if block generateStatement
};
const generateExpression = node= > {
  const result = Statements[node.type](node);
  return result;
};
Copy the code

B) Type declaration

This is an object that contains functions bound to the AST node type. Each node contains the logic needed to process the AST node type and generate the source code. For example, for a function declaration, it contains all possible variations of parameters, identifiers, logic, and return types. There is a common level of recursion where statements of one type trigger statements of another type, which may trigger statements of another type, and so on.

Here, we only have the statement functions needed to process the “index.es6.js” file, so it’s pretty limited. You can see how much code is required to process only 3-4 lines of AST tree (except for those above).

Again, this borrows from “EscodeGen Here,” so feel free to ignore the details unless you plan to write your own compiler.

const Statements = {
  FunctionDeclaration: function(node) {
    let id;
    if (node.id) {
      id = generateIdentifier(node.id);
    } else {
      id = "";
    }
    const body = generateFunctionBody(node);
    return ["function", space, id].concat(body); // JOIN
  },
  BlockStatement: function(node) {
    let result = ["{", newline];
    // USUALLY withIndent OR for loop on body OR addIndent
    result = result.concat(generateStatement(node.body[0])).flat();
    result.push("}");
    result.push("\n");
    return result;
  },
  ReturnStatement: function(node) {
    // USUALLY check for argument else return
    return [
      indent,
      "return",
      space,
      generateExpression(node.argument),
      semicolon,
      newline
    ];
  },
  BinaryExpression: function(node) {
    const left = generateExpression(node.left);
    const right = generateExpression(node.right);
    return [left, space, node.operator, space, right];
  },
  Literal: function(node) {
    if (node.value === null) {
      return "null";
    }
    if (typeof node.value === "boolean") {
      return node.value ? "true" : "false";
    }
    return node.value;
  },
  Identifier: function(node) {
    return generateIdentifier(node);
  },
  ExpressionStatement: function(node) {
    const result = generateExpression(node.expression); // was []
    result.push(";");
    return result;
  },
  AssignmentExpression: function(node, precedence) {
    return generateAssignment(node.left, node.right, node.operator, precedence);
  },
  MemberExpression: function(node, precedence) {
    const result = [generateExpression(node.object)];
    result.push(".");
    result.push(generateIdentifier(node.property));
    return parenthesize(result, 19, precedence); }};Copy the code

C) Process code statements

Finally, we’ll walk through the sequence body (that is, each line of code) and start running our generator. This returns an array named “code” that contains each line of our newly generated source code.

  const code = ast.body
    .map(astBody= > Statements[astBody.type](astBody))
    .flat();
Copy the code

6. Write into the build /

We will now skip step 5 and complete the core elements of the compiler. In this step we will:

  • Add the Source Map location to the generated code (we will build it in the next section)
  • Generate a package for the generated code (concatenating our code arrays together) and copy the original code so that the browser can see it (this is just one way).
// Add sourcemap location
code.push("\n");
code.push("//# sourceMappingURL=/static/index.es5.js.map");

// Write our generated and original
fs.writeFileSync(`./build/index.es5.js`, code.join(""), "utf8");
fs.writeFileSync(`./build/index.es6.js`, fileContents, "utf8");
Copy the code

5. Add Source Map support

There are four requirements for building the Source map:

  1. Saves a record of the source file
  2. Store records of generated files
  3. Store row/column mappings
  4. Use spec Version3 to display in the source mapping file

To win quickly, we can use a library called Source-Map that almost all JavaScript code generators use. It comes from Mozilla and deals with storage at points 1-3 and mapping to Base64 VLQ (Step 4).

A reminder of what source maps look like when mappings are highlighted (starting from above):

Mappings is Base64 VLQ, but what is that?

What is Base64 VLQ?

First, a brief introduction to Base64 and VLQ.

Base64

ASCII problem solved for languages that do not have a complete ASCII character set. Base64 has only a subset of ASCII, which is easier to handle in different languages.

VLQ (variable-length quantity)

To decompose the binary representation of an integer into a set of variable bits.

Base64 VLQ

Optimized to easily map between large numbers and the corresponding information in the source file.

A line of code is represented by a series of “segments”. The number “1” will be :AAAA => 0000

Here’s an example of how to build a “Segment” :

Build a basic mapping in JavaScript as follows:

// .. define "item"
const sourceArray = [];
sourceArray.push(item.generated.column);
sourceArray.push("file.es6.js");
sourceArray.push(item.source.line);
sourceArray.push(item.source.column);
const encoded = vlq.encode(sourceArray);
Copy the code

However, this doesn’t handle line and segment splitting (which can be tricky), so using Mozilla’s library is still more efficient.

Added source Map support

Let’s go back to our compiler.

Using mozilla SourceMapGenerator

To get the most out of Mozillas, we will:

  • Create an instance of sourceMap to save and build our mapping
  • Initialize and store local mappings

So when a node changes, we build the location and then add it to the local map and SourceMap instances. We keep a local instance so that we can save the start and end records for the current location, as this is critical to building the next location.

// SourceMap instance
const mozillaMap = new SourceMapGenerator({
  file: "index.es5.js"
});

// Local mappings instance
const mappings = [
  {
    target: {
      start: { line: 1.column: 0 },
      end: { line: 1.column: 0}},source: {
      start: { line: 1.column: 0 },
      end: { line: 1.column: 0}},name: "START"}];Copy the code

We need a function to actually handle updates to these map instances. The following buildLocation function handles all of the location generation logic. Most libraries have a similar function that uses the column and row offsets given by the caller.

Its job is to calculate the start and end of the new row and column numbers. It only adds mappings when the node changes, which limits the mappings we want to store.

const buildLocation = ({
  colOffset = 0, lineOffset = 0, name, source, node
}) = > {
  let endColumn, startColumn, startLine;
  const lastGenerated = mappings[mappings.length - 1].target;
  const endLine = lastGenerated.end.line + lineOffset;
  if (lineOffset) {
    endColumn = colOffset;
    startColumn = 0; // If new line reset column
    startLine = lastGenerated.end.line + lineOffset;
  } else {
    endColumn = lastGenerated.end.column + colOffset;
    startColumn = lastGenerated.end.column;
    startLine = lastGenerated.end.line;
  }

  const target = {
    start: {
      line: startLine,
      column: startColumn
    },
    end: {
      line: endLine,
      column: endColumn
    }
  };
  node.loc = target; // Update node with new location

  const clonedNode = Object.assign({}, node);
  delete clonedNode.original; // Only useful for check against original
  const original = node.original;
  if (JSON.stringify(clonedNode) ! = =JSON.stringify(original)) {
    // Push to real mapping. Just START. END is for me managing state
    mozillaMap.addMapping({
      generated: {
        line: target.start.line,
        column: target.start.column
      },
      source: sourceFile,
      original: source.start,
      name
    });
  }

  return { target };
};
Copy the code

Now that we have buildLocation we need to introduce it in our code. Here are a few examples. For the generateIdentifier processor utility and Literal AST type statements, you can see how we combine buildLocation.

// Processor utility
const generateIdentifier = id= > {
  mappings.push(
    buildLocation({
      name: `_identifier_ name ${id.name}`.colOffset: String(id.name).length,
      source: id.original.loc,
      node: id
    })
  );
  return id.name;
};

// AST type statement function (part of "Statements" object)
Literal: function(node) {
  mappings.push(
    buildLocation({
      name: `_literal_ value ${node.value}`.colOffset: String(node.value).length,
      source: node.original.loc,
      node
    })
  );

  if (node.value === null) {
    return "null";
  }
  if (typeof node.value === "boolean") {
    return node.value ? "true" : "false";
  }
  return node.value;
};
Copy the code

We need to apply this throughout the code generator (that is, all node processors and AST type statement functions).

I find this tricky because the node-to-character mapping is not always 1-2-1. For example, a function can use square brackets on either side of its arguments, which must be taken into account when it comes to character line positions. So:

(one) =>
Copy the code

There are different character positions:

one => 
Copy the code

What most libraries do is introduce logical and defensive checks using information on AST nodes so that all scenarios can be covered. I should have followed the same practice, unless I was just adding absolutely necessary code to our index.es6.js.

For complete usage, see the code for my generator in the repository. It lacks big chunks of code, but it gets the job done and is the building block of a true code generator.

The last part is to write our Source Map contents to the Source Map file. This is very easy in Mozillas because they expose a toString() method that handles Base64 VLQ encoding and builds all mappings into a file that conforms to the V3 specification. Good job! !

// From our Mozilla SourceMap instance
fs.writeFileSync(`./build/index.es5.js.map`, mozillaMap.toString(), "utf8");
Copy the code

Now the./build/index.es5.js we referenced earlier will have an existing file.

Our compiler is now complete!

This is the last part for the compiler, and now it’s confirmed that it’s working.

If we compile the code, it will generate a build folder with three files.

$ npm run compile
Copy the code

This is the original file, the generated file, and the Source map.

Test our Source map

There is a great website sokra.github. IO /source-map-… It allows you to visualize source map mappings.

The page begins like this:

By putting our three files in, we can now see:

It contains the original code, the generated code, and the decoded map (at the bottom).

A reminder of our previous conversion:

// Swap: "number + 1"
// - clone left node
const leftClone = Object.assign(
  {},
  sourceAst.body[0].body.body[0].argument.left
);
// - replace left node with right node
sourceAst.body[0].body.body[0].argument.left =
  sourceAst.body[0].body.body[0].argument.right;
// - replace right node with left clone
sourceAst.body[0].body.body[0].argument.right = leftClone;
// Now: "1 + number". Note: loc is wrong
Copy the code

We have an exchange:

number + 1
Copy the code

To:

1 + number
Copy the code

Can we confirm that the mapping is successful?

If we hover over a character or map, it will highlight the map and its corresponding position in the generated and original positions.

The screenshot below shows what happens when the mouse hovers over the number “1” character. It clearly shows that there is a mapping.

This screenshot shows what happens when I hover over the variable identifier “number” word. It clearly shows that there is a mapping.

What did we miss?

So what are the limitations of building such a compiler?

  • Not all JavaScript statements are overwritten (only the part of the file needed to be covered)

  • Currently it only works on one file. Web Bundlers will follow the application to build dependency diagrams and apply transformations on these files (see my “The Underbelly of Web Bundlers” article for more information on this).

  • Output file vs bundle. Web Bundlers produce code that can run in a specific JavaScript environment, and our Bundler is very limited.

  • Basic transformation. It’s not easy to perform additional optimizations without a lot of new code.

Thank you very much for reading. This topic has far-reaching significance and I have learned a lot in the process of research. I really hope this article has helped you understand how the JavaScript compiler and Source Maps work together, including the mechanisms involved.

Source code can be found at Craigtaub/our-own-babel-Sourcemap.