An in-depth understanding of ES Modules (hand-drawn examples)

Although it took nearly a decade of standardization work to get to this point, the ES module finally brings a formal, standardized module system to JavaScript.

The long wait is almost over. With the upcoming Release of Firefox 60 (currently in beta) in May, all major browsers will soon support the ES module, and the Node module Working group is currently adding ES support to Node.js. WebAssembly support for ES modules is also in the works.

Many JavaScript developers know that the ES module has been somewhat controversial, but few really know how it works.

Now let’s explore what the ES module solves and how it differs from other modular systems.

What problem does the module solve?

If you think about it, JavaScript coding is about properly managing variables, assigning values to variables, or assigning values to variables, or combining two variables and assigning them to another variable.

Because so much of your code changes variables, how you organize them can have a big impact on how you code and how your code is maintained.

It makes things a lot easier when you only need to consider a few variables at a time, and JavaScript has a way of helping you achieve this: scope. Because of scopes, functions cannot access variables defined inside other functions.

This is great. This means that when you focus on implementing a function, you only need to focus on implementing that function and don’t have to worry about other functions affecting the variables in your function.

However, it also has a drawback, which makes it more difficult to share variables between different functions.

So what if you do want to share your variables outside of scope? The usual practice is to place it above the current scope, such as the global scope.

If you remember the days when you used jQuery, you had to make sure jQuery was already in global scope before you loaded any jQuery plug-ins.

This works, but it creates some annoying problems.

First, all your script tags must be placed in the correct order. Then you have to be careful and make sure that these scripts don’t interfere with each other.

If you do accidentally mess up the order, your application will throw an exception while the code is running. When the function looks for the existence of a jQuery object — that is, under the global scope — but cannot find it, the function reports an error and stops execution.

This makes code maintenance tricky. Removing old code or script tags is like playing a casino wheel. You never know what code might crash. Dependencies between codes become hidden. Any function can get anything in the global scope, so you have no way of knowing which function depends on which script tag.

Second, since your variables are in a global scope, any code that is in that scope can change those variables. Malicious code can change these variables to make your code do things you don’t intend, or non-malicious code can accidentally break your variables.

How can modules help

Modules give you a better way to organize variables and methods. With modules, you can group these meaningful functions and variables together.

The module puts these functions and variables into a module scope. Module scope enables different functions in a module to share these variables.

But unlike function scopes, module scopes have a way of making their variables accessible to other modules. They can explicitly specify which variables, classes, or functions in a module can be accessed by other modules.

When something is available to other modules, it is called “export”. When a module’s export exists, other modules can explicitly specify that they depend on certain variables, classes, or functions of the module.

Because of this explicit relationship, you can specify which module will crash when you remove another.

Once you have the ability to export and import variables between modules, it’s easy to break your code into smaller chunks that work independently of each other. You can then combine or recombine these code blocks, like Lego bricks, to create different applications using the same blocks.

Because modules are so useful, there have been many attempts to add modules to JavaScript. At present, two types of module systems are widely used. CommonJS(CJS) was once used by Node.js. The ESM(ECMAScript module) is an updated system of modules added to the JavaScript specification. Browsers already support the ES module, and Node.js is adding support for it.

Now let’s take a closer look at how this new module system works.

How does the ES module work

When developing with modules, a module dependency diagram is created. The connection between dependencies comes from any import statements you use.

These import statements are key for the browser or Node to know exactly what code you need to load. You need to provide a file as an entry point to the dependency diagram. From this entry point, the rest of the required code can be found based on these import statements.

But the browser can’t use the files themselves. It must be parsed and converted into a data structure called “Module Records”. Only then will the browser know exactly what is happening in the file.

After that, module records need to be converted to module instances. The module instance contains two elements: code and State.

Coding is basically a series of instructions. It’s like a recipe. But the recipe alone doesn’t do anything, so you need some raw materials to go along with the instructions.

What is a state? States provide the raw materials. States are the specific values of these variables at any point in time. Of course, these variables are simply aliases for the container that holds them in memory.

So module instances combine encoding (a set of instructions) with state (the values of all variables).

What we need is a module instance for each module. The process of loading a module is to start from the entry file and end up with a dependency map of the entire module instance.

For ES module, this process is mainly divided into three steps:

Construct – find, download and parse all files into modular records
Instantiation – Find places in memory to hold all the exported values (but don’t fill them with specific values yet) and have the exports and imports point to those places in memory. This process is also called linking.
Evaluation – performs encoding and populates the actual value for the corresponding memory location in the instantiation.

People say ES modules are asynchronous. You can think of it as asynchronous because the actual operation is divided into three distinct phases — load, instantiation, and evaluation — and these phases can be done separately.

This means that the specification does introduce an asynchrony that is not present in CommonJS. I’ll explain more later, but in CJS, dependencies downstream of a module are immediately loaded, instantiated, and evaluated without any interruption.

However, these steps do not have to be asynchronous; they can also be done synchronously. It depends on what you load it with. This is because not everything is defined by the ES module specification. There are actually two parts of the job, each covered by a different specification.

The ES module specification explains how files should be parsed into module records and how modules should be instantiated and evaluated. However, it does not specify how to obtain the module.

Files are loaded by module loaders, which are specified by different specifications. For browsers, this specification is the HTML specification. However, you can have different loaders depending on the platform you are using.

The loader also controls exactly how the module is loaded. These methods are called ES Module methods — ParseModule, module.instantiate, and module.evaluate. It’s a bit like a muppet operator operating a JS engine.

Now let’s go through each step in more detail.

Structure (Construction)

For each module, there are three processes that go through during construction:

Find where to download the file containing the module (also known as module resolution)
Get the file (downloaded from the file system via the URL)
Parse these files into module records

Find and retrieve files

The loader handles the process of finding and downloading files, and first it needs to find the entry file. In HTML, you use script tags to tell the loader where to look.

But how does it find the next bunch of modules — the ones main.js directly depends on?

This is where the import statement comes in. One part of the import statement is called a module identifier, which tells the loader where to look for the next module.

One thing worth mentioning about module identifiers is that sometimes they need to be handled differently between the browser and Node. Each host has its own method for parsing the string of module identifiers. To do this, it uses something called a module resolution algorithm, which varies from platform to platform. Currently, some module identifiers that parse properly in Node do not parse properly in browsers, but there is an ongoing effort to fix this.

Until this problem is fixed, browsers only accept urls as module identifiers. They download the corresponding module file from the URL specified by the module identifier. For the generation of a module’s dependency graph, this step is not simultaneous. Because you can’t figure out what dependencies the module needs until you parse the module file, and you can’t parse the file before you get it.

This means that we have to drill down the module’s tree layer by layer, parse a file and work out the module’s dependencies, and then fetch and load those dependencies further.

If the main thread has to wait for each file to download, other downloading tasks will wait in the queue.

This is because the download takes a long time in the browser.

Based on this graph.

Blocking the main thread like this will slow down apps built using modules. This is one of the reasons why the ES module specification divides the implementation algorithm into phases. Separating the build phase out allows the browser to download the file and build the module dependency diagram before dealing with the synchronous instantiation phase.

This implementation, which divides the module algorithm into different stages, is one of the main differences between CommonJS and ES modules.

CommonJS doesn’t use this approach because it’s much faster to read files from the file system than to download files from the network. This means Node can block the main thread while loading the file. Then, now that the file and load are done, instantiation and evaluation (not separate phases in CommonJS) take its course. This also means that you need to traverse the entire module dependency tree and load, instantiate, and evaluate any module dependencies before returning the module instance.

The way CommonJS is implemented has some implications, which I’ll explain later. It’s worth noting, however, that in Node environments that use CommonJS modules, you can use variables in module identifiers. Because all code for the current module is executed (before the require statement) before looking for the next module. This means that when Node does module parsing, the variables in the module identifier already have values.

But for the ES module, we build the entire module dependency graph before performing any evaluation. This means that variables cannot exist in module identifiers because they do not yet have specific values.

However, it is sometimes useful to use variables in module paths. For example, you might switch and load different modules depending on the conditions of the code or the environment in which it is currently running.

To achieve the same effect in the ES module, there is a proposal called Dynamic Import. With it, you can use statements like import(‘ ${path}/foo.js’)

The idea behind this approach is that any file loaded with import() is treated as an entry to a separate dependency graph. Dynamically introduced modules create a new module dependency graph, which is processed separately.

It is important to note that any module in both dependency diagrams will share the same module instance. This is because module loaders cache module instances. There is only one module instance for each module in a particular global scope.

This means less work for the engine. For example, a module file is fetched only once, even if multiple modules depend on it (this is one of the reasons we need module caching; we’ll cover another reason in the section on evaluation).

The loader manages the module cache through something called a module map. Each global environment tracks its modules in a separate module map.

When the loader loads a URL, it places the URL in the module map and notes that it is currently loaded. It then issues a request and starts processing the next file to load.

What if another module relies on the same file? The loader checks each URL in the module map, and if it finds that it is already loaded, the loader skips and processes the next URL.

However, a module map doesn’t just keep track of which modules are being loaded; it also acts as a cache for modules, as we’ll see next.

Parsing

Now that we have the module file, we need to parse it into a module record. This helps the browser understand what the parts of the module are.

Once a module record is created, it is placed in the module map. This means that the module loader can retrieve the corresponding module record from the module map as long as the corresponding module is requested externally.

There is one detail in the parsing process that may seem trivial, but actually has a big impact. All modules are resolved with “use strict” at the top. There are a few other subtle differences, such as the await keyword being reserved in the module’s top-level code, and the value of this being undefined

Different parsed goals are called parsed goals. If you use different parsed goals for the same file, you will get different results. So you need to know the type of file, whether it’s a module or not, before parsing begins.

In the browser it’s very simple, you just add type=module to the script tag. This tells the browser that the current file needs to be parsed as a module, and because only modules can be imported, the browser knows that all imports in the current file are modules as well.

But in Node, we don’t use HTML tags, so we can’t use the type attribute. One way the community has tried to solve this problem is by using the.mjs extension. Using this extension tells Node, “This file is currently a module.” So you can see signals that are talking about using this as a parsing target. Discussions are still ongoing, so it’s not clear what signals the Node community will eventually use.

Either way, the loader decides whether or not to parse a file as a module. If it is a module and there are some imports, the loader will do this again until all the files have been loaded and parsed.

When we’re done, when the loading process is over, you’ve gone from having just one entry file to having a bunch of module records.

The next step is to instantiate the modules and link the instances together.

Instantiation (Instantiation)

As I mentioned earlier, a module instance contains encoding and state. These states exist in memory, so the instantiation step is to wire the contents of the module into memory.

First, the JS engine creates a module environment record. It records management variables for the module, and the engine then finds a location in memory for the module export. The module environment variable keeps track of which location in memory corresponds to which export.

These memory locations have no specific values in them yet. They are filled with actual values only after evaluation. There is one caveat to this rule: any exported function declarations are initialized at this stage. This is much easier to evaluate.

To instantiate the entire module dependency graph, the engine performs “depth-first post-order traversal,” which means that the engine dives down to the bottom of the dependency graph — some module dependency at the bottom that doesn’t depend on any other dependencies — and sets up their exports.

When the engine has finished linking all exports downstream of the module, it then goes back up to the next level to link all imports from the module.

Notice that both imports and exports point to the same location in memory. Linking all exports first ensures that all imports match those exports correctly.

This process is different from CommonJS modules. In CommonJS, the entire export object is copied to the export. This means that any exported value (such as a number) is a copy.

This also means that if the export module has changed since then, the import module will not notice the change.

In contrast, the ES module uses a mechanism called “Live Bindings”. Both import and export modules point to the same location in memory. When an export module changes one of its exported values, the change is immediately apparent in the import module.

A module that exports values can change those values at any time, but an importing module cannot change the values it imports. That is, if a module imports an object, it can change the property values on the object.

The reason for using active bindings is that you can link these modules together without running any code. This is useful for executing looping dependent modules, as I’ll explain next.

So at the end of this step, we link together the in-memory locations of the imported and exported variables for all module instances.

Now we can evaluate the code and fill in the actual values for these in-memory addresses.

Evaluation

The final step is to populate these memory locations with values. The JS engine does this by executing top-level code (code outside of functions).

In addition to filling in memory, there are side effects to the evaluation process, such as a module requesting a server.

You only want to evaluate the module once because of potential side effects. In contrast to the linking that takes place in the instantiation, the result of multiple links is the same, but the result of evaluation may differ depending on the number of operations.

This is one of the reasons for the need for module mapping. The module map caches modules through canonical urls so that each module has only one module record. This ensures that each module is executed only once, and as with instantiation, this process is performed in a depth-first follow-through manner.

What about the circular reference problem we mentioned earlier?

In circular dependencies, you end up with a loop in the dependency diagram. Normally, this would be a long loop, but to illustrate, I’m going to use an example of a simulated short loop.

First let’s look at how circular dependencies are implemented in the CommonJS module. First, the main module will execute all the way to the require statement. It then loads the counter module instead.

The Counter module then attempts to access message from the exported object. But since message has not been evaluated in the main module, undefined is returned. The JS engine will allocate memory for the local variable and set its value to undefined

The evaluation process will continue in the top level code of the Counter module and execute to the bottom. We want to see if message eventually gets the correct value (after main.js is evaluated), so we set a timeout. The evaluation will then resume in main.js.

The message variable will be initialized and added to memory. However, since the exported message from Main is not yet associated with the require in counter, message remains undefined in the imported module (counter.js)

If the exports are handled as “active bindings”, the counter module will end up with the correct value. By the time timeout runs, the main.js evaluation is finished and the values are filled in.

Supporting these cyclic dependencies is an important principle of ES module design. It is this “three-step” design that makes circular dependencies possible.

Current status of the ES module

With the release of Firefox 60 in early May, all major browsers will support the ES module by default. Node has also added support, and a working group is working on compatibility issues between CommonJS and ES modules.

type=module
dynamic import
import.meta
module resolution