Comics: Simple ES module

ES Modules: A Cartoon Deep-dive

Originally written by Lin Clark

The Nuggets translation Project

Permanent link to this article: github.com/xitu/gold-m…

Translator: stormluke

Proofreader: Starrier, zephyrJS

The ES module provides an official standardized module system for JavaScript. However, it took some time — nearly a decade of standardization.

But the wait is almost over. With the release of Firefox 60 (currently in beta) in May, all major browsers will support the ES module, and the Node Module Working group is working to add ES module support to Node.js. Integration of ES modules for WebAssembly is also underway.

As many JavaScript developers know, the ES module has always been controversial. But few people really understand how the ES module works.

Let’s take a look at what ES modules solve and how they differ from modules in other modular systems.

What problem does the module solve?

JavaScript programming is all about managing variables, so to speak. What you do is you assign a value to a variable, or you add on a variable, or you combine two variables and put them in another variable.

Because so much of your code is about changing variables, how you organize those variables can have a big impact on how you code and the maintainability of your code.

Just having to consider a few variables at a time makes things easier. JavaScript has a way to help you do this called scope. Due to JavaScript’s scoping rules, one function cannot access a variable defined in another function.

That’s good. This means that when you write a function, you only care about the function itself. You don’t have to worry about what other functions might do to the variables inside the function.

Still, it has drawbacks. This makes it a little difficult to share variables between functions.

What if you want to share variables outside the scope? A common way to handle this problem is to put it in an outer scope… For example, in global scope.

You may remember this from the days of jQuery. Before loading any jQuery plug-ins, you must ensure that jQuery is in global scope.

This is effective but also has side effects.

First, all script tags need to be in the correct order. So you have to be careful to make sure that order doesn’t get messed up.

If you mess up the order, your application will throw an error during runtime. When the function looks for the jQuery it expects — in the global scope — and doesn’t find it, it throws an error and stops running.

This makes maintaining the code very tricky. This makes removing old code or script tags a game of roulette. You don’t know what you’re gonna damage. The dependencies between different parts of the code are implicit. Any function can get anything in the global scope, so you don’t know which functions depend on which script tags.

The second problem is that because these variables are globally scoped, every part of the globally-scoped code can change that variable. Malicious code may intentionally change the variable to cause your code to do something you don’t want it to do, or non-malicious code may accidentally mess with your variable.

How does the module help?

Modules give you a better way to organize variables and functions. Modules allow you to group meaningful variables and functions together.

This puts these functions and variables into the module scope. Module scope can be used to share variables between functions in a module.

But unlike a function scope, a module scope can also provide its variables to other modules. They specify which variables, classes, or functions in a module should be shared.

When something is provided to another module, it is called export. Once you declare an export, other modules can explicitly say that they depend on that variable, class, or function.

Because this is an explicit relationship, when a module is removed, you can determine which modules are going wrong.

Once you can export and import variables between modules, it’s much easier to break the code down into chunks that work independently. You can then combine or recombine these code blocks (like Lego) to create a variety of different applications from the same set of modules.

Because modules are so useful, there have been many attempts to add module functionality to JavaScript throughout history. Two modular systems are in widespread use today. CommonJS (CJS) is used historically in Node.js. The ESM (EcmaScript module) is an updated system that has been added to the JavaScript specification. Browsers already support the ES module, and Node is adding support.

Let’s take a closer look at how this new module system works.

How does the ES module work

When developing with modules, a dependency map is created. The connections between the different dependencies come from the various import statements you use.

The browser or Node uses import statements to determine what code needs to be loaded. You give it a file as an entry point to the dependency diagram. It will then follow the import statement to find any remaining code.

But the browser can’t use the file itself. It needs to parse these files into a data structure called Module Records. So it knows exactly what’s going on in the file.

After that, the module record needs to be converted to a Module instance. An instance consists of two parts: code and state.

A code is basically a set of instructions. It’s like a recipe that tells you how to make something. But you can’t do anything with code alone. You need to combine raw materials with these instructions.

What is a state? States are the things that give you these raw materials. An instruction is a collection of the actual values of all variables at any time. Of course, these variables are just the names of the blocks of data that hold values in memory.

So module instances combine code (the list of instructions) with state (the values of all variables).

What we need is a module instance for each module. Module loading is the process of generating dependency diagrams containing all module instances, starting from this entry file.

For ES modules, there are three main steps:

Construct – find, download, and parse all files into module records.
Instantiation – Find an area of memory to store all exported variables (but not yet filled with values). Then have export and import point to these memory blocks. This process is called linking.
Evaluation – Runs the code to fill the memory block with the actual value of the variable.

People say ES modules are asynchronous. You can think of it as asynchronous because the whole process is divided into three phases — load, instantiation, and evaluation — which can be done separately.

This means that the ES specification does introduce an asynchrony that does not exist in CommonJS. I’ll explain later, but in CJS, a module and all its dependencies are loaded, instantiated, and evaluated all at once without interruption.

Of course, these steps themselves do not have to be asynchronous. They can be done synchronously. It depends on who’s doing the loading. This is because the ES module specification does not control everything. There are actually two parts of the job, each controlled by a different specification.

The ES module specification explains how to parse a file into a module record, and how to instantiate and evaluate the module. However, it does not say how to obtain the files.

It’s the loader that gets the file. Loaders are defined in a different specification. For browsers, this specification is the HTML specification. But you can have different loaders depending on the platform you’re using.

The loader also precisely controls how modules are loaded. It calls ES Module methods — ParseModule, module.instantiate, and module.evaluate. It’s a bit like controlling the JS engine puppet by pulling strings.

Now let’s go through each step in more detail.

structure

During the construction phase, each module goes through three things.

Find out where to download the file containing the module (also known as module resolution)
Get the file (downloaded from the URL or loaded from the file system)
Parse files into module records

Find the file and get

The loader is responsible for finding the file and downloading it. First it needs to find the entry file. In HTML, you tell the loader where to find it by using script tags.

But how does it find the rest of the modules — the ones main.js relies on directly?

This is where the import statement comes in. Part of the import statement is called a module identifier. It tells the loader where to find the rest of the modules.

One thing to note about module identifiers is that they sometimes need to be handled differently between browsers and Nodes. Each host has its own way of interpreting the module identifier string. To do this, it uses an algorithm called module resolution, which varies from platform to platform. Currently, some module identifiers available in Node do not work in browsers, but this issue is being fixed.

Prior to the fix, browsers only accepted urls as module identifiers. They will load the module file from that URL. However, this does not happen simultaneously across the dependency graph. Before parsing the file, you don’t know what dependencies the module in the file needs… And you can’t parse that file until you get it.

This means we have to walk through the dependency tree layer by layer, parse a file, find its dependencies, and then find and load those dependencies.

If the main thread waits for these files to download, many other tasks will pile up in the queue.

This is why when you use a browser, the download section takes a long time.

Based on this chart.

Blocking the main thread like this makes modular applications too slow to use. This is one reason why the ES module specification divides the algorithm into stages. Separating the construction process allows browsers to download files and build their own understanding of the module diagram before performing synchronous initialization.

This approach — breaking the algorithm into different stages — is one of the main differences between the ES module and the CommonJS module.

The reason CommonJS can be handled differently is that loading files from the file system takes much less time than downloading them over the Internet. This means Node can block the main thread while loading a file. And now that the file is loaded, direct instantiation and evaluation (there is no distinction between these two stages in CommonJS) should be taken for granted. This also means that you traverse the entire tree, loading, instantiating, and evaluating all dependencies before returning the module instance.

The CommonJS method has some implicit properties that I’ll explain later. One is that in nodes that use CommonJS modules, variables can be used in module identifiers. You execute all the code in this module (up to the require statement) before looking for the next module. This means that when you do module parsing, the variables will have values.

But for ES modules, you need to build the entire module diagram before doing any evaluation. This means that you can’t have variables in your module identifier because they don’t have values yet.

But sometimes it’s really useful to use variables in the module path. For example, you might need to toggle loading a module depending on how the code is running or the environment in which it is running.

In order for the ES module to support this, there is a proposal called dynamic import. With it, you can use import statements like import(‘ ${path} ‘/foo.js.

The idea is that any file loaded via import() is used as an entry point to a separate dependency graph. Dynamically imported modules open a new dependency graph and process it separately.

It is important to note that modules that exist in both dependency diagrams will share the same module instance. This is because the loader caches module instances. There will be only one module instance for each module in a particular global scope.

This means less work for the engine. For example, this means that even if multiple modules depend on a module, that module’s files will only be fetched once. (This is one reason for caching modules, and we’ll see another in the evaluation section.)

The loader manages this cache using something called a module map. Each global scope tracks its modules in a separate module map.

When the loader starts fetching a URL, it puts the URL into the module map and marks it as fetching a file. It then issues the request and proceeds to fetch the next file.

What happens if another module depends on the same file? The loader looks for each URL in the module map. If you see fetching, it will immediately start the next URL.

But module mapping does more than just keep track of which files are being fetched. A module map can also serve as a module cache, as we’ll see next.

parsing

Now that we have the file, we need to parse it into a module record. This helps the browser understand the different parts of the module.

Once a module record is created, it is recorded in the module map. This means that at any time after that, if there is a request for it, the loader can retrieve it from the map.

There is one detail in the parsing that may seem trivial, but actually has a big impact. All modules are parsed as if they use “use strict” at the top. There are other nuances. For example, the keyword await remains in the top-level code of the module, and the value of this is undefined.

This different way of parsing is called “parsing target”. If you parse the same file with different targets, you will get different results. So at the beginning of parsing you want to know what type of file you are parsing — whether it is a module or not.

This is easy in a browser. You just set type=”module” in the script tag. This tells the browser that the file should be resolved as a module. And since only modules can be imported, the browser knows that anything imported is a module.

However, in Node, HTML tags are not used, so there is no option to use the type attribute. One way the community has tried to solve this problem is by using the.mjs extension. Use this extension to tell Node “This file is a module.” You’ll see what people call the parsing target signal. Discussions are still ongoing, so it’s unclear what signals the Node community will ultimately decide to use.

Either way, the loader decides whether to parse the file as a module. If it is a module with an import, the loader will start the process again until all the files have been retrieved and resolved.

We’re done! At the end of the loading process, you go from a single entry file to a bunch of module records.

The next step is to instantiate the module and link all instances together.

instantiation

As I mentioned earlier, instances combine code and state. State exists in memory, so the instantiation step is to wire the content into memory.

First, the JS engine creates a Module environment Record. It manages the corresponding variables recorded by the module. It then allocates memory space for all exports. The module environment record tracks the relationship between different memory regions and different exports.

These memory regions have not been assigned yet. Only after evaluation do they get the true value. There is one caveat to this rule: any export function declaration is initialized at this stage. This makes it easier to evaluate.

To instantiate the module graph, the engine performs what is called depth-first post-order traversal. This means that it goes down to the bottom of the module diagram — until it doesn’t depend on anything else — and handles their export.

The engine connects all exports under a module — that is, all exports that the module depends on. It then goes back to the previous level to connect all the imports for the module.

Note that both exports and imports point to the same region of memory. Connecting exports first ensures that all exports can be connected to the corresponding imports.

This is different from CommonJS modules. In CommonJS, the entire export object is copied when exported. This means that any value of export, such as a number, is a copy.

This means that if the export module changes the value later, the import module will not see the change.

In contrast, ES modules use something called Live Bindings. Both modules point to the same location in memory. This means that when the export module changes a value, the change is reflected in the import module.

A module that exports values can change those values at any time, but an importing module cannot change the values it imports. However, if a module imports an object, it can change the property values on that object.

Dynamic binding is used because then you can connect all modules without running any code. This helps in evaluating the presence of cyclic dependencies, as I explain below.

Therefore, at the end of this step, we wire up all instances and the memory locations of the export/import variables.

Now we can start the evaluation code and populate these memory locations with their values.

evaluation

The last step is to fill in the values in memory. The JS engine does this by executing the top-level code — code outside of functions.

In addition to filling in values in memory, the evaluation code can cause side effects. For example, a module might request a server.

Because of potential side effects, you only want to evaluate the module once. Unlike the linking process that occurs in the instantiation, where multiple links yield the same result, the evaluation result may vary with the number of evaluations.

This is one of the reasons for the need for module mapping. Module maps cache modules through canonical urls, so there is only one module record per module. This ensures that each module is executed only once. Just like instantiation, this is done through depth-first post-order traversal.

What about those circular dependencies we talked about earlier?

If there are cyclic dependencies, that will eventually generate a loop in the dependency diagram. Usually, there is a long loop path. But to explain this, I’m going to use a short circular, artificial example.

Let’s see how the CommonJS module handles this. First, the main module executes to the require statement. Then it’s going to load the counter module.

The Counter module then attempts to access message from the exported object. However, since this has not been evaluated in the main module, undefined is returned. The JS engine will allocate memory for local variables and set the value to undefined.

The evaluation process continues until the end of the top-level code of the Counter module. We want to see if we end up with the correct message value (after main.js is evaluated), so we set timeout. We then continue evaluating it on main.js.

The message variable is initialized and added to memory. But since there is no connection between the two, it will remain undefined in the Counter module.

If the export is processed using dynamic binding, the Counter module will eventually see the correct value. When timeout runs, the evaluation of main.js is finished and the value is filled in.

Supporting these cyclic dependencies is a big reason behind the ES module design. It’s this three-stage design that makes it possible.

What is the status of the ES module?

With the release of Firefox 60 in early May, all major browsers will support the ES module by default. Node has also added support, and a working group is working on compatibility issues between CommonJS and ES modules.

This means you can use type=module in script tags and use import and export. However, more module features have yet to be implemented. The dynamic import proposal is in stage 3 of the specification process, as is import.meta, which helps support Node.js use cases, and the module resolution proposal will help smooth out the differences between browsers and Node.js. So we can expect better module support in the future.

Thank you

Thanks to all those who responded to this article, or provided information through writing and discussion, Including Axel Rauschmayer, Bradley Farias, Dave Herman, Domenic Denicola, Havi Hoffman, Jason Weathersby, JF Bastien, Jon Coppeard, Luke Wagner, Myles Borins, Till Schneidereit, Tobias Koppers and Yehuda Katz, Thanks also to the members of the WebAssembly Community Group, the Node Module Working Group, and TC39.

aboutLin Clark

Lin is an engineer in Mozilla’s developer relations group. She studies JavaScript, WebAssembly, Rust and Servo and has drawn code comics.

code-cartoons.com
@linclark