ES modules: A Cartoon deep-dive

ES Modules brings an official canonical modular system to JavaScript. It took nearly 10 years to complete the standardization.

Our wait is almost over. With the release of Firefox 60 in May (currently in beta), ES Modules will be available on all major browsers, and the Node Modules Working group is currently trying to get Node.js to support ES Modules. In addition, the ES Module integration for WebAssembly is under way.

As most JS developers know, ES Modules has had its fair share of controversy up to now. But very few people really know how ES modules actually work.

Let’s take a look at what ES Modules solves and how it differs from other modular systems.

What problems does modularity solve?

We think about how variables are handled in general when we write JS code. We’re almost entirely assigning to variables or adding two variables or concatenating two variables and assigning them to another variable.

Since most of our code is just about changing the values of variables, how you organize them has a huge impact on what code you write and how well you maintain it.

Working with only a few variables at a time will make our job easier. JS itself provides a way to help us do this, called scope. Because of scope in JS, variables defined in other functions cannot be used in each function.

This is great! This means that when you code in a function, you only need to think about the current function. You don’t have to worry about what other functions might do to your variables anymore.

Although this is true, it also has its disadvantages. This makes it difficult to share variables between different functions.

What if you do want to share your variables outside the scope? A common practice is to put them in an outer scope. For example, global scope.

You may remember the following operation in jQuery. Before you can load jQuery, you have to introduce jQuery into the global scope.

Ok, it’s working fine. But there are the same controversial issues here.

First, your script tags need to be placed in the correct order. Then you have to be very careful to make sure that no one changes the order.

If you mess this up, and you use the previous dependencies in the middle, your application will throw an error. Your function will look around and see where jQuery is. In the big picture? And then, if it doesn’t find it, it will throw an error and your application will hang.

This will make your code very difficult to maintain. This will make you feel like you’re shaking the dice when deleting code or script tags. You don’t know when this is going to crash. Dependencies between different codes are also less obvious. Any function can use something global, so you don’t know which functions depend on which script files.

The second problem is that these variables exist in the global scope, and all code exists in the global scope and can modify these variables. It could be to make these variables malicious code, so that you deliberately execute code that you don’t mean, or it could be non-malicious code that conflicts with your variables.

How does modularity help us?

Modularity gives you a way to organize variables and functions. By modularity, you can group variables and functions properly.

It puts these functions and variables in the scope of a module. The scope of this module allows functions to share variables together.

But unlike function scopes, module scopes have a way of making their variables available to other modules. They explicitly arrange which variables, classes, or functions can be used by other modules.

When something is set to be used by other modules, I need a function called export. Once you use the export function, other modules know exactly which variables, classes, or functions they depend on.

Because it’s a definite relationship. Once you want to remove a module, you can know which module will be affected.

When you can use export and import to handle variables between different modules, it will be easier to break your code into smaller pieces that run independently of each other. You can then combine or recombine the pieces, like Lego bricks, to reference the common modules in different applications.

Since modularity is really useful, there are a lot of attempts to add useful modules to JS. Today, there are two commonly used modular systems. One is CommonJS, which Node.js has always used. Then there’s ES Modules, which is a little later but designed specifically for JS. Browsers already support ES Modules, and Node is trying to do so.

Let’s take a closer look at how this new modular system actually works.

How do ES Modules work?

As you develop these modules, you create a diagram.

The browser or Node uses these inbound declarations to know exactly what code you need to load. You need to create a file as the entry point for this dependency. The rest of the code is then searched based on those import declarations.

But these files cannot be used directly by the browser; they are parsed into data structures called module records.

After that, the module record will be converted to a module instance. A module instance is made up of two parts: code and state.

Code is the basis of this list of instructions. It’s like a guide on how to do it. But you can’t do much with it. You need materials to be able to use these guides.

What is a state? Status gives you the material! At any given time, the state will give you the real values of these variables. Of course, these variables are just aliases for storing values in memory (references).

Module instances combine code (a series of bootstraps) with state (the in-memory values of all variables).

What we need is for each module to have its own module instance. The loading process of a module is to find the relational table of the whole module instance through the entry file.

For ES Modules, there are three steps:

  1. Build – find, download, and parse all files into module records.
  2. Instantiation – finds where the exposed values should be placed in memory (but does not populate them with values), and then creates where exports and imports should be in memory. This is called linking.
  3. Evaluation – Runs code that assigns variables in memory to real values.

People say ES modules are asynchronous. You can think of it as asynchronous because the process is divided into three distinct phases — load, instantiation, and evaluation — and these steps are executed separately.

This means that this rule is asynchronous and not subordinate to CommonJS. As I’ll explain later, in CommonJS, a module’s dependencies are loaded, instantiated, and evaluated immediately after the module is loaded, without any interruptions (i.e., synchronization).

In any case, the steps themselves are not necessarily asynchronous. They can be processed synchronously. It depends on what the loading process depends on? That’s because not everything follows the ES Modules specification. This is really a two-part job, subject to different specifications.

The ES Module specification explains how you should parse these files into module records and how you should instantiate and evaluate them. However, it does not say how to obtain these files in the first place.

There are loaders for retrieving these files, and loaders are explicitly defined in different specifications. For browsers, the specification is HTML Spec. But you can use different loaders on different platforms.

The loader also specifies how the control module should be loaded. This is called the method – ParseModule ES Module, the Module. Instantiate, as well as the Module. The Evaluate. It’s like a puppet of the JS engine.

Now let’s find out what happened each step of the way.

build

Three things happened to each module during the construction phase.

  1. Determine where to download the modules contained in the file (aka the module solution).
  2. Get files (downloaded from a URL or loaded from a file system)
  3. Parses files into module records
Find the file and get it

The loader will try to find the file and download it. The first step is to find the entry file. In HTML, you should use the script tag to tell the loader where the entry file is.

But how do you find the next modularity file — the module that main.js directly depends on?

This is where the import declaration comes in. The part of the import declaration called the module declaration tells the loader to find the next module in sequence.

One thing to note about module declarations is that they are handled differently on the browser side and on the Node side. Each hosting environment has its own way of interpreting strings used for module declarations. To accomplish this, module declarations use an algorithm called module interpretation to distinguish between different host environments. Currently, some module declaration methods that run on the Node side do not run on the browser side, but we are working on fixing this.

Until this problem is fixed, browsers will only accept URLs as module declarations. They will load the module file from this URL. But for the whole graph, this is not a synchronous behavior. You never know which dependencies you need to get until you’ve parsed the entire file. And that you can’t start parsing the file until you get it.

This means that we have to parse the file by parsing the dependency layer by layer. Then all dependencies are identified, and those dependencies are found and loaded.

If the main thread is waiting for each file to download, other tasks will be placed behind the main thread event queue.

Continuing to block the main thread like this will make your application very slow to use these modules. This is one of the reasons why the ES Modules specification splits this algorithm into multi-stage tasks. Split its build into its own phases before instantiating and allow the browser to fetch the files and untangle the dependency tables.

One of the differences between ES modules and CommonJS modules is that the module declaration algorithm is split into stages for execution.

CommonJS is different from ES Modules in that it takes much less time to load files from the file system than to download them from the web. This means that Node will block the main thread while it is loading the file. As soon as the file is loaded, it instantiates and does the evaluation (which is why CommonJS doesn’t do this in separate phases). This also means that before you return a module instance, you will traverse the entire dependency tree and then load, instantiate, and evaluate each dependency.

I’ll explain more about some of the effects of CommonJS later. You can use variables for module declarations in nodes that use CommonJS. Before you look for the next module, you will execute all the code for that module (until you return the declaration via require). This means that your variables will be assigned when you handle module parsing.

But in ES Modules, you’ll have the entire module dependency diagram set up before you perform module parsing and evaluation. This means that when your module declares, you can’t use these variables because they haven’t been assigned yet.

However, there are times when it is necessary to use variables as module declarations. For example, you may have a situation where you need to introduce modules based on how the code executes.

In order to be able to do this in ES modules, there is a proposal called dynamic introduction. Like this, you can do the import declaration import(‘ ${path}/foo.js’) like this.

This way of loading arbitrary files through import() is to use it as an entry point for each individual dependency diagram. This dynamic introduction of the module starts a new diagram to be processed separately.

Even so, it is important to note that all of these diagrams share the same module instance for any module. This is because the loader caches these module instances. For each module, there is a special scope in which only one module instance exists.

Obviously, this reduces the engine’s workload. For example, the target module file will only be loaded once even if multiple module files depend on it. (This is the reason for caching modules, we will see just another evaluation.)

The loader manages this cache through something called a module map collection. Each global scope holds these separate sets of module maps through stacks.

When the loader is ready to fetch a URL, it puts the URL into the module map and then marks the file currently being fetched. It will send a request (with status of fetching) and then prepare to fetch the next file.

<img src=”http://o8gh1m5pi.bkt.clouddn.com/18-4-15/64202072.jpg”/ height=”300px”>

What happens when other modules rely on the same file? The loader will iterate through the URL in the module mapping set, and if it finds that the file is being fetched, it will simply look for the next URL.

But the module mapping collection does not hold stacks of files that have been fetched. As we will see, the set of module maps is also used as a cache for modules.

parsing

Now that we have the file, we need to parse it into a module record. This will help the browser know the different parts of these modules.

Once the module record is created, it will be placed in the module mapping collection. This means that whenever it is requested here, the loader will take it from the mapping collection.

There is a seemingly trivial detail in the compilation process, but it has a major impact. All modules parsed are treated as if they have use strict at the top. There are two other details. As an example, the await keyword is pre-stored at the top of the module code, and the top-level scope this is undefined.

This different way of parsing is called the “parsing target.” If you parse the same file with different goals, you will get different results. Therefore, before you start parsing the type of file you are parsing, you need to know if it is a module.

In the browser, this should be very simple, you just need to set type=”module” in the script tag. This will speed up the browser and the file will be parsed as a module. And that only modules can be referenced; the browser knows that any introduction is a module.

On the Node side, however, you don’t use HTML tags, so you can’t use the type attribute. The community came up with a solution to this by using an EXTENSION of MJS for such files. This extension tells Node, “This is a module.” You can see that people see this as a signal to parse the target. This discussion is still ongoing, and it is unclear what signals the Node community will eventually adopt.

Either way, the loader will decide whether to treat a file as a module. If this is a module and there are references, it will go through the process again until all the files have been retrieved and parsed.

The next step is to instantiate the module and link all the instances together.

instantiation

As I said earlier, an instance is a combination of code and state. State exists in memory, so the instantiation step is actually to connect everything to memory.

First, the JS engine creates a record of the module’s environment. It records management variables for this module. It then finds all the exported values in the relevant area of memory. This module environment record will track the area of memory associated with each export.

These memory areas will not be filled with real values until the evaluation operation is performed. There is a caveat to this rule: all exported function declarations will be initialized at this stage. This will make the evaluation process much easier.

During module instantiation, the engine will use a depth-first subsequent traversal algorithm. This means that engines go all the way down to the bottom of the graph — the bottom of their dependencies (independent of others) — and then set their export values.

The engine does the concatenation of all exports under this module — all exports that the module depends on. And then it will go back to the top and concatenate all the inputs of this module.

Note that the export and import are in the same area of memory. The prerequisite for concatenating all exports is to ensure that all references match the import of their corresponding export.

This is different from CommonJS modularity. In CommonJS the entire exported object is a copy of the exported object. This means that all values (such as numbers) are copies of derived values.

This also means that if the exported module changes later, the imported module will not notice the change.

In contrast, ES modules use active bindings, where all modules import and export to the same memory area. That is, if the exported value of a module changes, the imported module will also be affected.

The module itself can make changes to the exported values, but the module that imported them forbids making changes to those values. That being said, if a module introduces an object, it is possible to modify the value on that object.

The reason for using active bindings is that you can chain all the modules together without having to execute any code. This will help you use the circular dependencies I’m going to talk about next.

At the end of this step, we have successfully instantiated the module and concatenated the imported and exported values in memory.

Now we can evaluate the code and assign its value in memory.

Evaluate the operation

The last step is to fill the relevant areas in memory. The JS engine does this by executing top-level code — code outside the function.

In addition to populating the in-memory correlation, evaluating the code can have side effects. For example, a module might invoke a service.

Because of the potential side effects, you only need to evaluate the module once. Unlike the link generated when the instantiation occurs, the same result can be used multiple times. The result of evaluation will also be different depending on how many times you evaluate it.

That’s why we’re going to use a collection of module mappings. The module map collection caches the URL of the specification, so only one corresponding module record exists for each module. This ensures that each module is executed only once. Like the instantiation process, it also uses the depth-first sequential traversal method.

What about circular dependencies we talked about earlier?

In circular dependencies, you end up with a loop in the diagram. In general, this is a long loop. But in order to explain this problem, I will only artificially design a short loop to give an example.

Let’s look at how this works in a CommonJS module. First, the main module will do the require declaration. Then load the counter module.

The counter module will try to fetch message from the exported module, but it has not been evaluated in the main module, so it will return undefined. The JS engine will allocate a space in memory for it and give it the value undefined.

The evaluation continues until the end of the top-level code of the counter module. We want to see if we can finally get the value of message (after the main.js evaluation), so we set a timeout and evaluate main.js.

The message variable will be initialized and added to memory. But there is no relationship between the two; it is still undefined in the module being required.

If the exported value is processed by active binding, the counter module will get the correct value at the end. When timeout is executed, the main.js evaluation has been completed and the memory area is filled with the real value.

Supporting cyclic dependencies is one of the reasons ES Modules are designed this way. It’s the three stages that make this possible.

What is the current state of ES modules?

With the release of Firefox 60 in early May, all major browsers will support ES Modules by default. Node will also support this approach, and the team is trying to make CommonJS compatible with ES Modules.

This means that you will be able to use the script tag plus type=module to use import and export. However, more and more module features will be available. Dynamically imported proposals have clearly entered Stage 3, and the import.meta proposal will allow Node.js to support this writing. The Module Resolution Proposal will also support browsers and Node.js smoothly. So you expect modular work to get better and better in the future. The original translation