introduce

This is the 20th high-level JavaScript tutorial that explains how ESM works in detail. It is worth noting that this article references this document for a large amount of content and images

The body of the

1. What is ESM

The ESM (EcmaScript Module) brings a formal, formalized Module system to JS, but it took nearly 10 years to implement. Until the release of Firefox version 60 in 2018, all major browsers have implemented ESM. Many JS developers know that modularity implementations are controversial (JS modularity was varied before ESM was released), but few JS developers really understand how ESM works.

Let’s take a look at what ESM solves and how it differs from other module specifications

1.1 What problems can modularity solve?

If you think about it, when you write code in JS, you are simply managing variables, as shown below: assigning a value to a variable, or changing the value of a variable, or assigning a combination of two variables to another variable

Because so much of your code is about variables, how you organize those variables is critical to the quality of your code. Manipulating only a few variables at a time can make things easier, and one operation that solves these problems is scope. A variable in one function cannot access a variable in another function (function scope)

This way, when we write a function, we don’t have to worry about whether other functions change the values of variables inside the function. There is a downside to this approach, however, which is that it is difficult to share variables between multiple functions. What if you do want to share variables? One way to do this is to place variables that need to be shared on scopes above these two scopes (such as the global scope).

Remember JQuery? Before we write JQuery code, we must make sure that JQuery is in global scope.

That’s one way to do it, but it has its drawbacks. You must ensure that the order between modules is not out of order. If you write JQuery code before loading JQuery, the program will throw an error because JQuery cannot be found.

This leads us to maintain some old code and consider whether deleting some old code, old will break the code. Because these dependencies are implicit.

Another problem is that variables are stored in global scope and can be overwritten by other code. For example, some malicious code can overwrite variables in the global scope and cause your site to fail.

1.2 How can modularity help?

Modularity provides a better way to organize variables and functions, using modules to group variables and functions together that make sense. Modularity stores functions and variables in the module scope, where functions in a module can share variables.

But a module scope differs from a function scope in that a module scope has a method that allows other modules to use variables in that module. It describes which variables, functions, and so on in a module can be used.

When something is available to other modules, it is called an export. With exports, other modules can explicitly describe what variables or functions they depend on.

// module A
export const counter = 0
Copy the code
// module B
import { counter } from './moduleA'
Copy the code

This is an explicit dependency, and if you remove another module, you know which modules are going to have problems.

With modularity, code can be broken down into small independent modules that can be combined to form a large project like Lego bricks.

Therefore, modularity is very important. Many modularity specifications have been proposed in the community before, including CJS (Node.js), AMD (require.js), CMD (sea-js) and ESM. ESM has been added to the JS specification, and Node is currently adapting to ESM.

Let’s take a closer look at how the ESM works.

2. How does the ESM work

When you develop with modularity, you build a dependency diagram, where the connections between the different dependencies come from import statements.

These import statements let the browser and Node know what code to load, you give them the address of a file, and they track it.

But the files themselves are not directly available to browsers, so they need to be parsed and converted into data structures called Module Record. That way, they know what’s going on in the file.

The module record is then converted into a module instance that holds two things: code and state.

A code is basically a set of instructions, like a recipe for a dish, but you can’t do anything with the recipe without the ingredients.

And the state is the raw material. The state is the actual value of the variable at any point in time, but of course the variable simply refers to the value stored in memory.

Thus, a module instance combines code (a list of instructions) with state (the values of all variables).

So when the module is loaded, it changes from a file to an instance of the module. In the process of ESM work, there are three stages:

  1. Construction: Find these files, download them, and parse them into module records
  2. Instantiation: Pointing exported variables into memory (but not filling values) and then pointing the export and import into that memory is called Linking.
  3. Evaluation: Runs the code and populates memory with actual values

We all say that the ES module is asynchronous, and you can think of it as asynchronous because the work is divided into three phases — build, instantiate, and compute. These phases can be done separately.

CJS is not asynchronous, so in the CJS specification, a module and its dependencies are built, instantiated, and evaluated all at once, without any interruption.

But these steps don’t have to be asynchronous, they can be done synchronously, depending on what is loading the modules. Because not everything is controlled by the ESM specification, this work is actually divided into two parts, controlled by different specifications.

  • The ESM specification describes how a file should be parsed into a module record and how the module should be instantiated and evaluated. However, the ESM specification does not say how to obtain these files.
  • Downloading files is controlled by loaders, which are defined by different specifications on different platforms. In the browser, the loader is defined by the HTML specification

Loader also controls how modules are loaded, calling built-in methods in ESM modules: ParseModule, module.instantiate, and module.evaluate. It’s kind of like a puppeteer controlling a JS engine with ropes.

Let’s take a closer look at what happens at each step.

2.1 build

During Construction, three things happen to each module:

  • Figure out where to download module files (also known as module parsing)
  • Get files (downloaded from the URL or loaded from the file system)
  • Parse files into module records

Find files and get files

tells the browser where the file is.

But how does it find the module main.js depends on? This is where the import statement comes in. The part of the import statement called the module specifier tells loader where to find the next module.

One thing to note about module specifiers is that they are handled differently in the browser and node.js environment. Each environment has its own way of parsing module specifiers. To do this, they use something called a module resolution algorithm, which varies a lot from platform to platform.

Browsers only accept urls as module specifiers from which they load files. However, before parsing the file, you do not know which modules the file depends on. Therefore, you need to parse the file first and then download the dependent module file.

This means traversing the tree layer by layer, parsing a file, finding its dependencies, and then finding the file and downloading it.

If the main thread is waiting to download the file, the thread will block. This is because it takes a long time for the browser to download.

Blocking the main thread will cause the program to run very slowly. This is why ESM divides the algorithm into multiple stages.

This approach of splitting the algorithm into phases is one of the main differences between ESM and CJS.

CJS does things differently, because a file can be read from the file system much faster than a network request, so Node.js can block the main thread while loading the file. Since the file is already loaded, you only need to instantiate and evaluate (in CJS, the two phases are not separate). This means that in CJS, you need to traverse the entire dependency tree, download, instantiate, and calculate all dependencies before returning the entire module.

In the module identifier of require in CJS, variables can be used. This is because the module has already executed all the code except require before looking for the next module, so the variable has a value during module parsing.

With ESM, however, you need to build the entire module diagram before proceeding to any computation phase. This means that there cannot be any variables in the module identifier because they do not yet have values.

But there are times when using variables can be useful in module identifiers, for example, when you might want to import a dependent module at a certain stage or under certain conditions.

To enable the ESM to do this, there is a proposal for dynamic imports. With it, you can use it

import(`${path}/foo.js`)
Copy the code

Import a module this way. This approach works by using any files loaded with import() as an entry point to a separate dependency graph. Dynamically imported modules are like importing an entry file with a dependency graph that will be processed separately.

Note, however, that the two dependent diagrams will share a module instance. This is because the loader caches the dependency instance, and there will be only one dependency instance for each module in any global scope. This means less work for the engine, such that even if multiple modules depend on the same module file, the module will only be loaded once (this is one reason for caching modules; another reason will be presented during the calculation phase).

Loader uses a data structure called a Module map to manage the cache. When a loader retrieves a URL, it puts it in the module map and indicates that it is fetching this file and fetching the next file.

What happens if another module also relies on the same file? The loader looks for the URL in the module map, and if it sees that it is getting the file, it jumps to the next file.

The module map not only keeps track of the files to fetch, but also acts as a cache for the module, as shown below.

Parsing

Now that you have the file, you need to turn it into a Module Record, which helps the browser understand the differences between modules.

Once a module record is created, it is placed in the module map, and the next time you see the same module, you can pull the module map directly out of the module record.

There is a trivial detail in the parsing phase, but it has a very important effect. All modules are parsed as “use strict”. In addition, when parsed as a module, the keyword await is reserved in the top-level scope, and this is undefined

However, when ordinary JS code is parsed, the situation is different. This different approach to parsing is called the parse goal. If you parse the same file using different parsing methods, you will get different results. So before you parse a file, you need to know that the file is a module.

In the browser, it’s as simple as adding type=”module” to

In Node, you need to change the file suffix to MJS, or add the option type: “module” to package.json.

Eventually, a module file is resolved into one or more module records.

The next step is to instantiate the module and link all instances together.

2.2 instantiation

As mentioned earlier, Instantiation combines code with state. The state exists in memory, so the instantiation step is to write the state to memory.

First, the JS engine creates a Module environment Record, which is used to manage the variables in the module record. It then finds the addresses in memory for all exports, and the module environment record tracks the associations between these addresses and exports.

These addresses in memory have no value at this point and will only be filled in after the computation phase is complete. The specification comes with a caveat: any exported function declarations will be initialized at this stage, making the computation phase easier to run.

To instantiate a module graph, the JS engine traverses the dependency tree according to a depth-first algorithm. This means that the JS engine needs to find the lowest level module (that is, the module does not depend on any modules) and determine its export export.

The engine concatenates all the exports below the module (all the exports the module depends on), and then it goes back up the level to concatenate the imports of the main module.

Note that both imports and exports point to the same location in memory, so you first need to wire the exports together to ensure that all imports have corresponding exports.

This is different from CJS, where the entire exported object is copied at export time, which means that the export in CJS (such as a number) is actually a copy. So if the export module changes the value of the export variable, the import module is not affected.

Unlike CJS, ESM uses “live binding.” If both modules point to the same memory address, if the export module changes a value, the imported value will also change. It is important to note, however, that only the export module can change these values; the import module cannot change the imported values.

The advantage of using this “live binding” is that you can connect all modules without running any code, which helps you analyze loop dependencies.

At the end of this phase, we have connected all instances of export/import variables to memory locations.

Now we are ready to evaluate the code and populate the values into memory.

2.3 Calculation Phase

The final step is to fill the memory, which the JS engine does by executing top-level code (code outside of functions). In addition to filling the memory with values, there can be side effects. For example, a module can call a server.

Because of the potential side effects, you may only want to run the code in your module once. Unlike instantiated objects in JS, instantiations run multiple times and get the same result. However, the module may get different results depending on how many times it is run.

This is one reason to use module maps, which cache modules through urls so that only one module record exists per module. Ensure that the code for each module is run only once.

Also, if it is a circular dependency, it will end up with a loop in the dependency diagram. This is usually a long loop. To illustrate this, take a short loop as an example:

Let’s see what happens in CJS. First the main module executes to the require statement, and then it loads the counter module.

However, the counter module tries to import message from the main module, but since the main module has not yet run the code, it will return undefined. The JS engine will allocate memory from the local variable and assign this value to undefined

The code will run until the end of the counter module, and let’s see if we get the correct value for message. Let’s set a timer and continue in the main module.

Message will be initialized and added to memory, but since there is no connection between main and counter, the main module imports message as undefined.

With live binding, the value of message in the main module is correct after the timer expires.

Support for this kind of loop is an important principle of ESM design, because the three-stage design makes it easy to handle this kind of scenario.

The end of the

With the introduction of ESM, JS finally has an official modularity implementation, and other modularity specifications will slowly roll out. At present, Node is gradually compatible with ESM. In the future, the DEVELOPMENT of ESM will be better and better.