The ES module introduces a formal system of standardized modules to JavaScript. But it took nearly 10 years of standardized working hours to get here. But the wait is almost over. With the release of Firefox 60, all major browsers will support the ES module, and the Node Module Working Group is currently working on adding ES module support to Node.js. Also, integration of ES modules for WebAssembly is underway.

As many JavaScript developers know, the ES module has always been controversial. But few people actually understand how the ES module works.

Let’s take a look at what ES modules can solve and how they differ from modules in other module systems.

What problems can modules solve?

When you consider encoding variables entirely in JavaScript. It’s all about assigning values to variables, or adding numbers to variables, or combining two variables and putting them into another variable.

Because so much of your code is about changing variables, how you organize those variables can have a big impact on how your code is written and how it is maintained.

There are only a few variables to consider at a time, which makes things easier. JavaScript has a way to help you do this called acting on. Because of the way scopes work in JavaScript, functions cannot access variables defined in other functions.

That’s good, it means that when you’re dealing with a feature, you can think about a feature. You don’t have to worry about what other functions do to variables.

There is, however, a drawback. It’s really hard to share variables between different functions.

What if you really want to share variables out of scope? A common way to handle this is to put it in a scope above you, such as a global scope.

You probably remember this from the jQuery era. Before loading any jQuery plug-ins, you must ensure that jQuery is globally scoped.

This works, but they are nasty problems.

First, all script tags must be in the correct order. Then, you must be careful to ensure that no one messes with the command.

If you do mess up the order, your application will throw an error during runtime. When the function looks globally for the desired jQuery, it throws an error and stops execution if it does not find it.

This makes maintaining the code tricky. It makes removing old code or script tags a game of roulette. You don’t know what’s going to happen, and the dependencies between these different parts of the code are implicit. Any function can capture everything globally, so you don’t know which function depends on which script.

The second problem is that because these variables are globally scoped, every part of the globally scoped code can change the variable. Malicious code can change this variable to make your code do something you don’t want it to do, or non-malicious code can unintentionally corrupt your variable.

How can modules help?

Modules give you a better way to organize variables and functions. Using modules, you can group variables and functions together that make sense.

This puts these functions and variables into the module scope. Module scope can be used to share variables between functions in a module.

But unlike function scopes, module scopes have a way of making their variables available to other modules as well. They can explicitly say which variables, classes, or functions in a module should be available.

When something is available to other modules, this is called an export. Once exported, other modules can explicitly say that they depend on the variable, class, or function.

Because this is an explicit relationship, you can know which module will be interrupted if you delete another.

Once you can export and import variables between modules, you can more easily break down your code into pieces that can work independently of each other. You can then combine and recombine these blocks to create all different kinds of applications from the same set of modules.

Because modules are so useful, there have been several attempts to add module functionality to JavaScript. Today, two modular systems are in active use. Node.js has historically used CommonJS (CJS). The ESM (EcmaScript module) is an updated system that has been added to the JavaScript specification. Browsers already support the ES module, and Node is adding support.

Let’s take a closer look at how this new module system works.

How does the ES module work

When you develop with modules, you build a dependency diagram. The connections between the different dependencies come from whatever import statements you use.

These import statements are how the browser or Node knows exactly what code it needs to load. You give it a file to use as an entry point for the graph. From there, it follows any import statements to find the rest of the code.

But the file itself is not something a browser can use. It needs to parse all these files to transform them into data structures called module records. That way, it actually knows what’s going on in the file.

After that, you need to convert the module record into a module instance. The example combines two things: code and state.

The code is basically a set of instructions. It’s like a recipe for how to do something. But by itself, you can’t do anything with this code.

What is a state? A state is the actual value of a variable at any point in time. Of course, these variables are just aliases for the values held in memory.

Thus, module instances combine code (a list of instructions) with state (the values of all variables).

What we need is a module instance for each module. The module loading process is changing from the entry point file to a complete diagram with module instances.

For the ES module, this process is divided into three steps.

  1. Build – Find, download all files and parse them into module records.
  2. Instantiation – Finds an address in memory to place all exported values (but not yet populated with values). Then make both exports and imports point to those addresses in memory. This is called linking.
  3. Evaluate – Runs the code to populate the box with the actual value of the variable.

People talk about ES modules being asynchronous. You can think of it as asynchronous, because the work is divided into three distinct phases (build, instantiate, and evaluate), and these phases can be done separately.

This means that the specification does introduce an asynchrony that does not exist in CommonJS. I’ll explain later, but in CJS, a module and its underlying dependencies are all loaded, instantiated, and evaluated at once, without any interruption.

However, the steps themselves need not be asynchronous. They can be done synchronously. This depends on the load being performed. This is because not everything is controlled by the ES module specification. There are actually two parts of the job, covering different specifications.

In the ES module specification, how you should parse files to module records, how you should instantiate and evaluate modules. However, it does not say how to get the file in the first place.

The loader will fetch the file. Loaders are specified in different specifications. For browsers, this specification is the HTML specification. But you can use different loaders depending on the platform you’re using.

The loader also precisely controls how modules are loaded. It calls ES Module methods – ParseModule, module.instantiate and module.evaluate. It’s kind of like a string that operates on a JS engine.

Now, let’s go through each step in detail.

structure

During the construction phase, three things happen to each module.

  1. Find out where to download the file containing the module (also known as module parsing)
  2. Extract the file (by downloading the file from the URL or loading it from the file system)
  3. Parse files into module records

Find the file and get

The loader takes care of finding the file and downloading it. First, it needs to find the entry point file. In HTML, you can tell the loader where to find it through script tags.

But how do you find the modules that the next set of modules in Main.js directly depend on?

This is where the import statement comes from. Part of an import statement is called a module specifier. It tells the loader where to find each module next.

One thing about module specifiers: they sometimes need to be handled differently between browsers and Nodes. Each host has its own way of interpreting the module specifier string. To do this, it uses an approach called the module resolution algorithm, which varies from platform to platform. Currently, some module specifiers that work in Node will not work in browsers, but they are still being fixed.

Previously, browsers only accepted urls as module specifiers. They will load the module file from that URL. But that doesn’t happen all over the graph at the same time. You don’t know what dependencies the module needs to get until you parse the file, and you can’t parse the file until you get it.

This means we have to walk through the tree layer by layer, parse a file, find its dependencies, and then find and load those dependencies.

If the main thread waits for each of these files to download, many other tasks will pile up in its queue.

That’s because when the browser is working, the download takes a long time.

Blocking the main thread in this way can make applications that use modules slow to use. This is one reason why the ES module specification divides the algorithm into stages. Attribute the build to the browser phase so that the browser can retrieve files and enhance its understanding of the module diagram before starting the synchronization of the instantiation.

This approach, which divides the algorithm into stages, is one of the main differences between the ES module and the CommonJS module.

CommonJS can do different things because loading files from the file system takes much less time than downloading them over the Internet. This means Node can block the main thread while loading the file. And since the file is already loaded, it makes sense to just instantiate and evaluate (not separate phases in CommonJS). This also means that you need to traverse the entire tree to get loaded, instantiated, and evaluated for any dependencies before returning the module instance.

The CommonJS method has some implications that I’ll explain in more detail later. One thing this does mean, however, is that in nodes with CommonJS modules, variables can be used in module specifiers. Require Before looking for the next module, you are executing all code in that module (up to the require statement). This means that when you do module parsing, the variable will have a value.

However, with the ES module, you need to pre-build the entire module diagram before doing any evaluation. This means that you cannot include variables in module specifiers because they do not have values.

But sometimes it can be useful to use variables for module paths. For example, you might want to switch modules to load depending on what the code is doing or running in.

To make the ES module possible, you can use a similar import statement import(${path}/foo.js).

This works by using any loaded file import() as an entry point for a single graph. Dynamically imported modules will launch a new diagram, which will be processed separately.

There is one thing to note, though – any module in these two diagrams will share a module instance. This is because the loader caches module instances. There will be only one module instance for each module in a particular global scope.

This means less work for the engine. For example, this means that the module file will only be extracted once, even if multiple modules are relied on. This is one reason for caching modules. We’ll see another reason in the evaluation section.)

The loader manages this cache using something called a module map. Each global variable tracks its module in a separate module diagram.

When the loader retrieves a URL, it puts the URL into the module map, noting that it is currently retrieving a file. It will then issue the request and proceed to begin retrieving the next file.

What happens if another module depends on the same file? The loader looks for each URL in the module map. If you see fetching, it will move forward to the next URL.

But the module diagram doesn’t just track the file being fetched. The module map also acts as a cache for the module, as we’ll see later.

parsing

Now that we have the file, we need to parse it into a module record. This helps the browser understand the different parts of the module.

Once a module record is created, it will be placed in the module diagram. This means that whenever requested from this point, the loader can pull it out of the map.

There’s one detail in the analysis that may seem trivial, but actually has a lot of meaning. Parse all modules as if they were “use strict” at the top. There are other subtle differences, too. For example, the keyword await is reserved in the module’s top-level code, and the value of this is undefined.

This different way of parsing is called “parsing target”. If you parse the same file but use different targets, you will get different results. Therefore, you want to know the type of file to be parsed – module or not – before you start parsing.

In a browser, this is very simple. You simply put in the script tag of type=”module”. This tells the browser to parse the file as a module. And because you can only import modules, the browser knows that any import is also a module.

But in Node, you don’t use HTML tags, so you don’t have the option of using the Type attribute. One way the community has tried to solve this problem is with the.mjs extension. Use this extension to tell Node, “This file is a module.” You’ll see that people see it as a signal to parse the target. Discussions are still ongoing, so it’s unclear what signals the Node community will ultimately decide to use.

Either way, the loader determines whether the file is resolved to a module. If it is a module and has imports, it will restart the process until all files have been extracted and resolved.

We’re done! By the end of the loading process, you’ve gone from having a single entry point file to having many module records.

The next step is to instantiate the module and link all instances together.

instantiation

As I mentioned earlier, instances combine code with state. This state exists in memory, so the instantiation step is to connect everything to memory.

First, the JS engine creates a module environment record. This will manage the variables recorded by the module. It then finds the addresses of all exports in memory. The module environment record tracks which address in memory is associated with each export.

These addresses in memory are not yet available for their values. Their actual values are filled in only after they are evaluated. This rule comes with a caveat: initialize all exported function declarations at this stage. This makes the evaluation much easier.

To instantiate the module graph, the engine does a depth-first back-order traversal. This means it will drop to the bottom of the chart – the bottom dependency that doesn’t depend on anything else – and set its export.

The engine completes the wiring for all exports under the module – all exports on which the module depends. It then returns a level to connect the content imported from the module.

Note that both exports and imports point to the same location in memory. Connecting the export first ensures that all imports can connect to the matching export.

This is different from CommonJS modules. In CommonJS, the entire exported object is copied at export time. This means that any value exported, such as a number, is a copy.

This means that if the exported module changes the value later, the imported module will not see the change.

In contrast, the ES module uses something called real-time binding. Both modules point to the same location in memory. This means that when the export module changes the value, the change is displayed in the import module.

A module that exports values can change those values at any time, but an importing module cannot change the values it imports. That being said, if a module imports an object, it can change the property values on that object.

The reason for this real-time binding is that you can connect all the modules without running any code. This helps when you have cyclic dependencies, as described below.

So, at the end of this step, we have connected all instances and the locations where the export/import variables are stored.

Now we can begin to evaluate the code and populate these memory locations with their values.

evaluation

The final step is to populate these addresses into memory. The JS engine accomplishes this by executing top-level code (code outside the function).

In addition to just populating these addresses in memory, the evaluation code can cause side effects. For example, a module might call a server.

You only need to evaluate the module once because of the possible side effects. In contrast to the fact that a link occurring in an instantiation can be executed multiple times with exactly the same result, the evaluation can have different results depending on how many times you execute it.

This is one of the reasons for having module mappings. Module mapping caches modules through canonical URLS, so there is only one module record per module. This ensures that each module is executed only once. As with instantiation, this is done as a depth-first post-traversal.

What about those cycles we talked about earlier?

In circular dependencies, you end up with a loop in the diagram. Usually, this is a long cycle. But to explain this, I’ll use a brief artificial example of a loop.

Let’s take a look at how to use this with the CommonJS module. First, the main module will execute until the REQUIRE statement. It will then load the counter module.

Counter. Js will then try to access Message from the exported object. But since this has not been evaluated in the main module, it will return undefined. The JS engine allocates space in memory for local variables and sets their value to undefined.

The evaluation continues until the end of counter. Js top-level code. We wanted to see if we would eventually get the correct message (after the main.js evaluation), so we set the timeout. Then retrieve the value after main.js.

The message variable is initialized and added to memory. But since there is no connection between the two, it will remain undefined in the required modules.

If the export is processed using real-time binding, counter. Js will eventually see the correct value. By the time the timeout runs, the evaluation of main.js is complete and the values are filled in.

Supporting these loops is an important rationale behind the ES module design. It is this three-phase design that makes them possible.

The resources

ES modules: A cartoon deep-dive