Comics under the ECMAScript module system

About the author

Lin Clark is a senior engineer at Mozilla, NPM, where she worked on Rust and WebAssembly. The translator also found this excellent article from MDN (Mozilla Developer Network) and wanted to translate it.

Matters needing attention

This article was published on March 28, 2018, so some of the conclusions in this article may be inaccurate. Please be rational.

The translation of this paper is quite difficult, and some terms cannot be translated literally. Therefore, I intend to keep the original words and summarize them here first.

Load: Load in the original text, sometimes refers to the network request, sometimes refers to the whole process from module acquisition, parsing to the final operation. The network request section is directly called “request” or “network request”.
ESM: ECMAScript module
CJS: CommonJS module
Module Record(s)

The following is the text:

The ESM specification brings a standardized module system to JavaScript. But the process was lengthy, with standards taking nearly a decade to develop.

The wait is almost over. With the release of Firefox 60 in May 2018, all major browsers will support ESM, and NodeJS ‘module team is also developing to support this module system. In addition, ES module support for WebAssembly is in the works.

Many JS developers know that ESM has been controversial since its launch, but few know how it works.

Let’s take a look at what ESM solves and how it differs from other modular systems.

What problems does modularity solve?

When you think about it, when you’re programming in JavaScript, are you managing variables? Assign a value to a variable, or combine the values of two variables, assign them to a new variable, and so on.

Because so much of your code is updating variables, how you organize and manage those variables will affect your ability to write good, maintainable code.

Things are much simpler if you only consider a small number of variables in your code at a time. JavaScript can use scopes, which prevent a function from accessing variables declared in other functions.

This is convenient because when you write a function, you only need to consider the variables defined in that function and don’t have to worry about other functions changing them.

However, the downside is that variable sharing between functions becomes difficult.

If you want to share a variable outside of the function scope, a common way is to put it in the outer scope, that is, the global scope.

You may remember when you used jQuery. When loading any jQuery plug-in, you need to ensure that the jQuery instance is accessible in the global scope.

This may seem OK, but it also creates a lot of annoying problems.

First, all your JavaScript scripts need to load in the correct order, and you need to make sure no one else breaks the order.

If the order is broken, the program runs with an error. When a function runs that needs access to jQuery in the global scope and can’t find it, it throws an error and stops execution.

This makes the code difficult to maintain. Removing old code (or an entire JavaScript script) is like playing roulette, and you don’t know what’s going to happen. The dependencies between scripts are implicit. Any function can access any variable in the global scope, so you don’t know which function depends on which script.

The second problem is that since these variables are in a global scope, any code in the scope can change them. Malicious code can do things that you don’t want to do, and even non-malicious code can accidentally break these variables.

How does modularity solve these problems?

Modularity provides a better way to organize these variables and functions. With modules, you can group related variables and functions together.

These variables and functions are in the module scope and can be shared by functions under that scope.

Unlike function scope, however, module scope provides a mechanism to share internal variables with other modules as well, and requires a display declaration of what can be shared.

Variables that can be accessed by other modules are called export variables. Once variables are exported, other modules can import them explicitly.

Since this is an explicit relationship, it is easy to see which modules are broken when a module is removed.

Once variables can be imported and exported from one module to another, it will also be easy to split code, which can be used independently or combined freely, like Lego blocks, to create different applications from the same set of modules.

Because modularity is so useful, many schemes have attempted to introduce it into JavaScript. There are two main methods: CommonJS (CJS) is what Node.js has been using for a long time, and EcmaScript Modules (ESM) has just been added to the JavaScript language specification. Browsers already support ESM, and Node.js is working on it.

Let’s take a closer look at how this new modular system, ESM, works.

How does the ESM work?

When you develop with modules, you’re actually building a module dependency diagram. The relationship between different modules depends on the import statement.

The import statement tells the browser or Node what code to load. You also need to define an entry point file for the dependency diagram. From this file, you can find all the module files that it depends on through the import statement inside.

Instead of using these files directly, browsers need to parse them into a data structure called Module Record to know what they do.

After that, the Module Record needs to be converted to a Module Instance, which contains two parts: the Code and the State of the program.

A code is a set of instructions, like a recipe, that teach you how to make a dish. But that’s not enough. You need raw materials.

What is program state? The state of the program will give you the raw materials you need. Program state is the value of a variable in the program’s execution. Of course, these variables are just nicknames for memory addresses, where the real values are stored.

So a module instance contains code (a set of instructions) and program state (the values of all variables).

For each module, we need an instance of the module. The process of module loading is to start from an entry file and build up a complete module instance diagram.

For ESM, this process is divided into three steps:

Construct, find and download all the files, and parse them into Module Records.
Instantiation, which allocates memory addresses to hold values exported by the module before they have actually been filled into memory. The corresponding exports and imports point to that address, a process called Linking.
Evaluation, which executes the code to store the actual value of the exported variable into the memory allocated in the second step.

Some people say that ESM is asynchronous, and you can say that because module loading is divided into three phases: acquiring the module, initializing it, and finally executing it, these processes can be separated.

This means that the ESM specification does provide an asynchronous specification that is not available in CJS (which I’ll explain later). In CJS, modules and their dependencies are downloaded, initialized, and finally executed together, with no dividing line between processes.

However, these steps themselves are not asynchronous; they can be performed synchronously, depending on how they are loaded. Not all processes are defined by the ESM specification. The above steps actually consist of two parts, each from a different specification definition.

The ESM specification explains how to turn files into module records, how to instantiate modules, and how to run module code, but it does not explain how to get module files in the first place.

The loader is responsible for getting the module file, and loader is defined in a different specification (HTML specification in browsers). Different platforms can use different loaders.

Loaders load modules by calling ESM specification methods such as ParseModule, module.instantiate, module.evaluate, etc., just like puppeteers (loaders) manipulating puppets (JS engines).

Take a look at these steps in detail:

Construction

The construction process is divided into three steps:

Determine where to download module files (also known as Module Resolution)
Get files (downloaded via a URL or accessed from a file system)
Parse the file to Module Record

Find and retrieve files

Loader is responsible for finding and retrieving files. First it needs to find the entry file. This can be specified in HTML with the script tag:

But how do you find the other module files main.js depends on?

This is done through the import statement. Part of the import statement is the Module specifier, which the Loader uses to find other modules.

One thing to note about module identifiers is that sometimes they parse inconsistently on browsers and Nodes. Each host environment has its own algorithm (called a Module Resolution algorithm) for handling module identifiers, and the algorithm varies from platform to platform. Currently, some identifiers that are correctly resolved in Node are incorrectly resolved in the browser, and this issue is being fixed.

Until this problem is resolved, browsers only accept urls as module identifiers and fetch module files from them. But you don’t have a complete dependency map of the module file, and you don’t know which modules it depends on until you parse the contents of the module, and you can’t parse the module until you get it.

This means that we must traverse the dependency tree layer by layer, parse a module file, determine its dependencies, and then find and retrieve those module files.

If the main thread waits for these files to be requested for download, other tasks will be blocked in the task queue.

Because in the browser, the download is time-consuming:

Blocking the main thread would make ESM-based applications clunky and difficult to use. This is one of the reasons why ESM divides the module loading algorithm into multiple stages. By separating out the build phase, the browser can grab all the module files before performing the synchronization operation (initializing the module) and build a complete module dependency graph, which helps to understand the module dependencies.

Splitting the loading algorithm into stages is one of the main differences between ESM and CJS.

The module loading process for CJS is different because the module file is retrieved from the file system much faster than a network request. This means that Node can get files without blocking the main thread. When the file is quickly retrieved, parsing, initialization, and code execution can take place together (these are not separate phases in CJS). This also means that you can walk through the tree, get a dependent module, parse it, initialize it, execute it, and finally return the module instance.

This way of loading CJS has some less obvious consequences, which I’ll explain later. In a CJS module based Node, you can use variables in module identifiers because the variables are assigned correctly when parsing the module because the previous code has been executed (before the require() statement).

But in ESM, the entire module dependency graph is built before the code runs, so you can’t use variables in identifiers because they haven’t been assigned yet.

Sometimes it is convenient to use variables in the module path, for example, you may need to decide which module to load based on the result of the code execution, or the different runtime environment.

In order to do this in ESM, there is currently a proposal for dynamic import, which can be imported dynamically through import(‘ ${path}/foo.js’).

Any module imported through import() is treated as an entry to another module’s dependency diagram and treated separately.

Note, however, that because Loader can cache module instances, the same module in these diagrams will all share a module instance. There is only one module instance for each module in a particular global scope.

This reduces the work of the browser engine. For example, although a module may be dependent on multiple modules, it will only be requested once (this is one reason for caching modules, and you’ll see another reason for executing modules).

Loader manages the module cache through a structure called a Module map. Each global scope has its own module table to keep track of all the modules it contains.

When loader prepares the URL to request a module file, it places the URL in the module table and identifies that the module is requesting it, then issues the request and continues to prepare to request the next module.

What if another module depends on the same module file? Loader will lookup the URL in the module table and if there is any fetching flag, it will skip the module and fetch the next module.

As you will see later, in addition to tracking which modules have been requested, the module table is also used to cache module execution.

Parse the file

Now that we have the Module file, we need to parse it into a Module Record so that the browser understands the composition of the Module.

Once the Module Record is created, it is placed in the Module table. Whenever the module is later requested, Loader can retrieve it directly from the module table.

In the process of parsing, there is a seemingly insignificant detail that is actually significant. All modules are parsed in strict mode, like a hidden “use strict” at the top of the code. There are other subtle effects, for example, in the module top-level code the keyword await is reserved and this has the value undefined.

Different ways of parsing are called different “parse goals”. If you parse the same file using different goals, the results will be different. So before parsing you need to know the type of file being parsed, such as whether it is a module.

In the browser, this is as simple as adding the type=”module” attribute to the script tag, which tells the browser that the file will be parsed according to module standards. Since only modules can be imported, dependency files imported by module files are treated as modules.

But in the Node environment, you don’t use HTML tags, so you can’t set the Type attribute. One solution the community has is to use.mjs as a file suffix to tell Node that the file is a module. Everyone sees this as an indication of the module’s resolution target. Discussions are ongoing, so it’s unclear whether Node will officially adopt this approach.

However, loader will decide whether to parse the file as a module or not, and if it is a module and contains imports, Loader will continue processing until all modules have been retrieved and parsed.

Finally, go from having just one entry Module file to having a bunch of parsed Module records.

The next step is to instantiate the module and link the module instances together.

instantiation

As I mentioned earlier, a module instance contains code and program state, and program state is in memory, so this step is really about associating something with memory.

First, the JS engine creates a Module environment Record, which manages exported variables in the Module Record, and then finds an address in memory for all exported variables. The Module Environment Record will record which chunk of memory is associated with which exported variable.

This memory has no real value for the time being, and will only be filled in during code execution. Additional note: Any exported function declarations are initialized at this stage to simplify the flow of code execution.

To instantiate all modules in the module dependency graph, the engine does a depth-first back-order traversal of the dependency tree. The module at the bottom of the dependency tree (which does not have any dependencies) is first accessed, associating their exported variables.

When the engine associates all the dependent export variables of a module, it goes back to the current module and associates the imports of the current module.

Note: All imports and exports point to the same location in memory. Associating exports first ensures that all imports can be associated with matching exports.

This is different from the module system of CJS, where the entire exported object is copied, meaning that all exported values (such as the number type – numbers) are copies. Therefore, if a module later updates the exported value, the imported module will not be aware of it.

Instead, the ESM uses an approach called Live Bindings. The import and export point to the same memory. When the export module updates the export value, the update is synchronized to the import module.

An export module can update exported values at any time, but an import module cannot update its imported values. However, if you are importing objects, the import module can update the object’s property values.

Real-time binding is also used so that all modules can be associated without executing any code, which helps with the problem of loop dependencies during code execution, as I’ll explain in the next section.

At the end, we instantiate all modules, and all imports and exports are associated with specific memory.

Now we can start executing the code to populate the corresponding values into the specific memory described above.

Execute the code

The last step is to execute the code. The JS engine stores the corresponding exported value into specific memory by executing the top-level code of the module (code outside the function).

In addition to filling memory, the execution of the code may have side effects. For example, the module code may ask the server:

The module code only needs to be executed once due to possible side effects. This is different from linking in the instantiation phase, which can be done multiple times with the same result, but multiple times of code execution with side effects does not guarantee the same result.

This is another reason for the need for the module table mentioned above. The module table caches modules based on the module URL, so each module has only one module record. The module table also ensures that each module is executed only once. As with the instantiation phase, code execution is depth-first post-order traversal if there are dependent modules.

Circular dependencies were mentioned earlier, so let’s look at this problem:

If there are cyclic dependencies, there will be loops in the module dependency diagram. This ring is large in the actual example, but for illustrative purposes I have constructed a hypothetical diagram with a small ring.

Let’s see what happens under CJS first. First, the entry module code will execute to the require() statement. It will then start loading the module counter.js

The module counter will try to access the message variable exported by the entry module, but since the entry module code has not yet executed into the export statement section, the return value will be undefined. At this point, the JS engine will allocate memory for the local variable message of the counter module and set the corresponding value to undefined (perform a copy).

Then the counter module code continues to execute. To verify that the correct message was retrieved in the final counter module (when main.js is also finished), I set a timer. Finally, the code execution returns to the entry module main.js.

After returning to main.js, message is initialized and assigned to memory. But since the local variable message in the counter module is not associated with it, the value of the local variable message is still undefined.

If the exported variable uses real-time binding, when the timer code executes, the main.js code has been executed, the exported variable has been assigned a value, and the module Counter will eventually access the correct exported variable value.

An important reason behind the DESIGN of the ESM is to support this circular dependency. It is this three-stage design that makes it work properly.

How is the ESM going?

With the release of Firefox 60 in early May, all browsers have ESM support by default. Node is also introducing support, with a working group working on compatibility issues between the CJS module system and ESM.

This means you can now use type=module on script tags, as well as import and export. However, more features are still to come. The Dynamic Import proposal is currently in Stage 3, the import.meta proposal is used to better support node.js use cases, and the Module Resolution proposal eliminates some of the differences between browsers and Node. So you can expect the ESM to get better in the future.