ES Modules: A Cartoon Deep-dive
preface
ES Modules brings an official, standardized system of modules to JavaScript. This is estimated to take nearly 10 years of standardization work to achieve.
But the good news is that the wait is almost over. With the release of FireFox 60, ES Modules will be available in all major browsers, and the Node Module working group is working on adding ES Modules support to Node. ES Moudles integration for WebAssembly is also in the works.
Many Js developers know ES Modules are controversial, but few really understand how they work.
Let’s take a look at what ES Modules solves and how it differs from other modular systems.
What problems do modular systems solve?
The essence of using JS for code development is to manage variables, such as assigning a value to a variable, modifying the value of a variable through calculation, or assigning the value to another variable by combining two variables.
Because so much of your code is about changing variables, how you organize those variables is very important to both the quality of your code and the management of your code.
It’s easy to consider only a few variables at a time. JS does this using scoped. Because of how scopes work, one function cannot access variables defined in another function.
This is very good design, because it means that when you’re in the scope of a function, you only have to think about that one function. You don’t need to worry about how other functions affect variables in your function’s scope.
However, there is a disadvantage to this approach, which makes it very difficult to share variables between different functions.
What if you do want to share variables outside of scope? A common way to do this is to put variables in a higher scope, such as a global scope.
You may remember what it was like back in the jQuery days when you had to load jQuery in the global scope before you could load a jQuery plug-in.
There are no big problems with this approach, but it does bring up some annoying little problems.
First, all your
If you mess up the order, your app will cause an error and stop executing the function because it can’t find jQuery in the global scope.
This makes code maintenance very difficult, and removing old code is a gamble because you never know when you’re going to crash. Dependencies between code are implicit, and any function can access the global scope, but you don’t know which function depends on which script.
The second problem is that since these variables are in the global scope, any piece of code in the global scope can modify them. Malicious code can modify variables for its own purposes, and non-malicious code can corrupt your variables.
How can modules help?
Modules give you a better way to organize and manage variables and functions. Modules allow you to integrate functions and variables together.
The module puts these functions and variables into a Module scoep(module scope). This scope allows functions in a module to share variables.
Unlike function scopes, module scopes can expose their variables to other modules. You can also explicitly declare which variables, classes, and functions are exposed to other modules.
This exposure is called export, and using export allows other modules to explicitly declare which variables, classes, or functions of the module they depend on.
The benefit of this explicit declarative relationship is that you know which modules will become unavailable if a module is removed.
Once you can export and import variables between modules, you can more easily chunk your code into chunks, which can then be combined and reassembled like Legos to create a variety of applications.
Because module systems are powerful, there have been many attempts to introduce module systems into JS in history. There are two types of module systems that are widely used. One is the CommonJs(CJS) specification used before Node.js. The other is the EcmaScript Modules (ESM) specification, which has been added to the JS specification. Browsers already support ESM, and Node.js is adding ESM support.
How do ES Modules work?
When developing with modules, the system creates a module dependency map. You use import and export statements to establish relationships between dependencies.
These statements let the browser or Node know what code it needs to load, and you need to provide a file as an entry point to the module dependency graph. Then use the import statement to find the rest of the code that needs to be loaded.
The browser itself cannot use these files directly; they need to be converted to a data structure called Module Records. This is how you know what each file is doing.
After that, the Module Record needs to be converted to a module instance. A module instance consists of two parts — code and State.
A code is basically a collection of instructions, like a recipe for cooking. But these instructions alone don’t do anything. You need some raw materials to use with this recipe.
What is a state? States give you the “raw materials” you need to “cook”. A state is the value of a variable at any time, but of course these variables are just a nickname for the memory location where these values are stored.
So a module instance is a combination of code (the set of instructions) and state (the values of all variables)
What we need is an instance of each module. The process of loading a module is the process of getting a complete module instance table from the entry.
In ES Modules, there are three steps.
- Construction – Get the module file and parse it into module Record
- Instantiation — Allocates memory for variable values to store export and import points into memory, a process called Instantiation
linking
- Evaluation — Run the code and write the true value of the variable to memory
You can think of ES modules as asynchronous because the whole process is broken down into three different stages — loading, instantiating, and evaluating — which can all be done separately.
This means that specification validation introduces asynchrony that is not present in CommonJS. Dependencies under a module in CommonJS are loaded, instantiated, and evaluated with little or no break.
However, these steps themselves are not necessarily asynchronous; they can also be done synchronously. It all depends on what is currently being loaded, because not everything is controlled by the ES module specification.
The ES module specification explains how to parse files into module records and how to instantiate and evaluate them. However, it does not say how to get the file first.
Loaders are used to extract files and are specified in other specifications. For browsers, this specification is the HTML specification. You can use different loaders depending on the platform you are using.
Construction
Each module performs three operations during the Construction phase
- Find out where to download the file containing the module
- Get the file (downloaded from the URL or loaded from the file system)
- Parse the file to
module record
Find and retrieve files
Loader is responsible for searching and downloading files. First it needs to find the entry file. In HTML, the script tag tells the Loader where to load the file from.
But how do you find the next level of the link, the module main.js directly depends on?
This is where the import statement comes from. The import statement from, followed by the module specifier, tells the Loader where to find the next module.
A word of caution about module prompts: sometimes we need to treat browsers and Nodes differently. Each host has its own interpretation module, which explains the way the character string works. To do this, it uses an approach called the module resolution algorithm, which varies from platform to platform. Some module specifiers available in Node do not work in the browser.
Until this problem is resolved, browsers only accept urls as module specifiers. The browser will load the module file from this URL, the browser doesn’t know what dependencies the module needs to get until it parses the file, and it can’t parse the file until it gets it.
This means that we have to parse the module dependency tree layer by layer, parse a file, find the dependencies for that file, and then continue to load those dependencies.
Because downloads can take a long time, if the main process waits for each of these files to download, many other tasks pile up in the queue.
Blocking the main thread causes applications using Modules to load too slowly, which is one reason the ES Modules specification divides the algorithm into stages. By separating construction into a separate phase, the browser can retrieve the files and improve its understanding of the module diagram before instantiation begins.
This practice of dividing the algorithm into stages is one of the main differences between ES Moudles and CommonJS.
Because loading a file from a file system takes much less time than downloading it over the Internet, CommonJs can take a different approach. This also means that Node might block the main thread while loading the file and just need to instantiate and evaluate (not a separate stage in CommonJs) since the file has already been loaded. This means that you need to traverse the entire module tree, loading, instantiating, and evaluating all dependencies before returning the module instance.
When using the CommonJs specification for Node, you can use variables in the module specifier and execute all the code in the module when parsing it. This means that when you parse a module, a variable in a block will have a value.
But in the ES specification, you build the entire module dependency diagram before evaluating it. Variables have no value at this stage, so you cannot use variables in module specifiers.
Using variables in module paths is really necessary when you want to switch between modules in different code or in different runtime environments.
Dynamic import can be used in ES Modules to implement this, with the syntax import(path)
Any file loaded in this way will treat itself as a separate entry, and dynamically imported modules will establish a separate module dependency graph.
One thing to note, though, is that all modules in both figures will share a module instance. Loader caches module instances, so there is only one unique instance of multiple identical modules in a particular global scope.
That means less work for the engine. For example, even if multiple modules depend on the same module, that module will only be loaded once. This is what caching modules are for, and another reason we will see later in the Evaluation section.
Loader manages the module cache using a method called Module Map. Each global scope tracks modules through a separate Module map.
When a loader requests a URL, it puts the URL into the Module Map and marks it to indicate that the file is being downloaded. A request is then sent to retrieve the file, and the next file is retrieved.
What if another module also relies on this file? Lodaer will locate this URL in the Module map and loader will load the next URL if it has been marked as fetching.
The Module Map is not only used to track files being loaded, it also acts as a cache for modules.
parsing
When we have finished downloading the file, we parse and convert it into a Module Record structure that allows the browser to understand the differences between modules.
Module Records are saved in a Moduel map, which means that loaders can retrieve dependencies from the Loader map at any time.
Even the smallest detail can make a big difference when parsing. For example, all module files are parsed according to a strict pattern. There are a few other minor details, such as the await keyword being retained in the top-level code and the value of this being undefined.
Different ways of parsing are called parse goals. If you parse the same document using different goals, you will get different results. So you need to know what you’re parsing and whether it’s a module or not before parsing.
This is easy to do in a browser, you just need to use type=”module” on the script tag. This tells the browser that the file is a module and needs to be resolved as a module, and since only modules can be imported, the browser knows that all imported files are modules.
But you can’t use HTML tags in Node, so you can’t use type attributes. One solution in the community is to use the.mjs extension to tell Node that the file is a module.
Either way, loader decides whether to parse the file as a module. If it is a module and there are modules that also contain imports, the Loader will continue to run until all files have been retrieved and parsed.
With the end of these loaders, we went from having a single entry file to having a lot of Module Recored.
The next step is to instantiate all the modules and integrate them together.
instantiation
As I mentioned earlier, a module instance is made up of code and state. State is kept in memory because the instantiation step is essentially writing to memory.
First, the JS engine creates a Module Environment Record. It is used to manage variables in module Record, after which the engine finds the memory address of all exports. Module Environment Record tracks the mapping between memory addresses and export.
There is no value in memory at this point, and the actual value is written to memory only after evaluation. This causes all exported function declarations to be instantiated at this stage to make evaluation easier.
To instantiate the module dependency graph, the JS engine traverses the module tree in depth-first order.
Note that both import and export point to the same address in memory, so we first iterate through export to ensure that each import finds a matching export.
ES Moudles and CommonJS instantiation are different. In CommonJS, the entire exported object is copied when exported, meaning that any exported value is a copy, known as a deep copy. This means that modifying the exported file is not reflected in the module file itself.
In contrast, ES modules are implemented using dynamic binding. All module instances point to the same memory address, which means that when I change the value in export, the change is also reflected in import.
We know that a module that exports values can change those values at any time, but an importing module cannot change the values it imports. That being said, if a module imports an object, it can change the property values on that object.
evaluation
The final step is to populate the values into memory. The JS engine does this by executing top-level code (that is, code outside the function).
In addition to just populating these values in memory, the evaluation code can trigger side effects. For example, a module might request a server.
Unlike the same result obtained by linking in the instantiation stage every time, the result obtained by evluation execution may be different. In order to avoid potential side effects caused by such situation, we should only carry out evaluation once
This is where the Module Map comes in. The Module Map caches modules through canonical urls to ensure that there is only one Module record per module. This ensures that each module is executed only once. Like the Instantiation stage, this stage is handled by depth-first post-order traversal.
What about circular dependencies we mentioned earlier?
In circular dependencies, you end up with a loop in the diagram, usually a long loop. However, to explain this, I will use a short loop as an example.
Let’s see how CommonJS handles this. First, the main module will execute until the REQUIRE statement, and then it will load the counter module.
The counter module will then attempt to access the message from the exported object. But since it has not been evaluated in the main module, it will return undefined. The JS engine allocates space in memory for local variables and sets their value to undefined.
Evaluation continues down to the end of the counter module’s top level code. We want to see whether we’ll get the correct value for message eventually (after main.js is evaluated), so we set up a timeout. Then evaluation resumes on main.js
.
Message variables will be initialized and added to memory. But since there is no connection between the two, the value is thrown as undefined.
If the export is processed using real-time binding, the counter module will eventually see the correct value. By the time the timeout runs, the evaluation of main.js will be complete and filled with the value.
Supporting these loops is an important reason behind the ES module design.