This article is translated

Creating and Working with WebAssembly Modules

Originally written by Lin Clark

The original address: hacks.mozilla.org/2017/02/cre…

This is part four of the WebAssembly article series. If you haven’t read the others, I suggest youFrom the very beginning.

A WebAssembly is a way to run a language other than JavaScript in a Web page. In the past, if you wanted code running in a browser to interact with different parts of a Web page, you had to use JavaScript.

So, when people say “WebAssembly is fast,” they are comparing everything to JavaScript. But that doesn’t mean it’s either-either use WebAssembly or use JavaScript.

In fact, we expect developers to use both WebAssembly and JavaScript in the same application. Even if you don’t write WebAssembly code yourself, you’ll still benefit from it.

The WebAssembly module defines functions that are exposed to JavaScript calls. So, just as you can now pull LoDash from NPM and call the library’s functions through the API, you will be able to pull WebAssembly modules in the future.

So, let’s look at how to create a WebAssembly module and how to use it from JavaScript.

Where does WebAssembly fit into the process?

In the article on assembly, I talked about how compilers compile high-level languages into machine code.

Where should the WebAssembly be inserted in this image?

You might think that WebAssembly is just another target assembly language. This is partly true, because in addition to WebAssembly, the languages in the diagram (x86, ARM) all correspond to a machine architecture.

When code is sent to a user’s machine over a network, the target architecture information that the code needs to run is unknown.

So WebAssembly is a little bit different from those assembly languages. It is a machine language, but it corresponds to a conceptual machine, not a real or physical machine.

For this reason, WebAssembly instructions are sometimes referred to as virtual instructions. Virtual instructions map to machine code more directly than JavaScript source code. Virtual instructions represent a class of instructions that can be efficiently executed on mainstream hardware, rather than a specific piece of machine code mapped directly to a specific machine.

Once the WebAssembly code is downloaded by the browser, it can be easily compiled from WebAssembly to the target machine assembly encoding.

Compile to the wasm

By far the best compiler toolchain for WebAssembly support is LLVM. A wide variety of compiler front ends and compiler back ends can be plugged into LLVM.

Note: Most WebAssembly module developers code in C or Rust and compile into WebAssembly, but there are other ways to create WebAssembly modules. For example, there is an experimental tool that lets you create WebAssembly modules in TypeScript, or you can code directly in WebAssembly text.

Suppose we want to compile from C to WebAssembly. We can use the front end of Clang to compile C into THE IR (intermediate representation) of LLVM. Once the code is in the IR state of LLVM, LLVM can parse the code and perform some optimizations.

To compile LLVM’s IR into WebAssembly, we need a compiler back end. There is currently a compiler back end under development in the LLVM project. The compiler backend is almost ready and should be ready soon. However, you can’t use it yet.

Now, we can use Emscripten as an alternative. It has its own compiler back end that helps us compile to another target (called ASM.js) and then convert to WebAssembly. Emscripten uses LLVM underneath, so you can switch between Emscripten’s two compiler backends at will.

Emscripten includes a number of additional tools and libraries that enable Emscripten to help port the entire C/C++ code base. So Emscripten is more of an SDK than a compiler. For example, operating system developers are used to having a file system that can be read and written, and Emscripten can simulate a file system with IndexedDB.

Whatever toolchain you use, the end result is a.wasm file. I’ll cover more information about the.wASM file structure later. Before we do that, let’s look at how to use.wasm files in JS.

Load the.wasm module in JavaScript

The.wASM file is the WebAssembly module, which can be loaded in JavaScript code. Currently, the loading process is a bit cumbersome.

function fetchAndInstantiate(url, importObject) {
  return fetch(url).then(response= >
    response.arrayBuffer()
  ).then(bytes= >
    WebAssembly.instantiate(bytes, importObject)
  ).then(results= >
    results.instance
  );
}
Copy the code

You can learn more about this in our documentation.

We’re trying to make the process easier. We expect improvements to the toolchain through loaders like WebPack or SystemJS. We believe that loading a WebAssembly module will one day be as easy as loading a JavaScript module.

However, there is one major difference between WebAssembly modules and JavaScript modules. Currently, functions in WebAssembly can only take numeric types (integers or floating-point numbers) as arguments or return values.

To work with more complex data types, such as strings, you must use the memory of the WebAssembly module.

If you work mostly with JavaScript, you may not be familiar with direct access to memory. Higher-performing languages like C, C++, and Rust tend to manage memory manually. All of those languages have the concept of a heap, which the WebAssembly module simulates in memory.

To do this, the WebAssembly module uses an ArrayBuffer in JavaScript. An “array buffer” is an array of bytes whose index is used to represent memory addresses.

If you want to pass strings between JavaScript and WebAssembly, you can convert these characters to equivalent character encodings. And then you need to write them into the memory array. Because the index is an integer, it can be passed to the WebAssembly function. In this way, the index of the first character of the string can be used as a pointer.

When a developer writes a WebAssembly module for web developers to use, it is likely to put a wrapper around the module. Then, as a module user, you don’t need to know the details of memory management.

If you want to learn more, check out our documentation on WebAssembly memory.

Wasm file structure

If you write code in a higher-level language and then compile it into WebAssembly, you don’t need to know how WebAssembly modules are constructed. But understanding its file structure helps us understand some of the basics.

If you haven’t read the article on compiling (Part 3 of this series), I suggest you do.

Here is a C function that we can convert to WebAssembly:

int add42(int num) {
  return num + 42;
}
Copy the code

You can try compiling this function using the WASM Explorer.

If you open the.wasm file (and your editor supports presentation), you’ll see something like this.

00 61 73 6D 0D 00 00 00 01 86 80 80 80 00 01 60
01 7F 01 7F 03 82 80 80 80 00 01 00 04 84 80 80
80 00 01 70 00 00 05 83 80 80 80 00 01 00 01 06
81 80 80 80 00 00 07 96 80 80 80 00 02 06 6D 65
6D 6F 72 79 02 00 09 5F 5A 35 61 64 64 34 32 69
00 00 0A 8D 80 80 80 00 01 87 80 80 80 00 00 20
00 41 2A 6A 0B
Copy the code

This is the “binary” representation of the module. I’ve quoted binary because this notation usually uses hexadecimal notation, but it’s easy to convert these notations to binary or human-readable formats.

For example,num + 42The representation of theta is like this.

How the code works: Stack Machine

Here’s how these instructions work for you.

You may have noticed that the add operation does not specify its pass parameter. This is because WebAssembly is a stack machine. This means that the values required by the operation are queued on the stack before the operation is performed.

An operation like Add knows the number of arguments it needs. Since add takes two arguments, it pulls two values off the top of the stack. Because the Add instruction does not need to specify source and target registers, add is short (a single byte). This reduces the size of.wasm files, which also means a shorter download time.

Although WebAssemblies are formulated as stack-structured machines, they don’t work this way on physical machines. The browser uses registers when it converts a WebAssembly to the machine code of the currently running machine. Because WebAssembly code does not specify registers, WebAssembly leaves this up to the browser, which has the flexibility to choose the best register allocation scheme for the current machine.

Module sections

In addition to the add42 function itself, there are several other parts of the.wasm file. They are called segments. Some sections are required in any module, while others are optional.

Necessary:

  1. Type: contains function signatures defined in the module and any functions imported.
  2. Function: Gives an index of each Function defined in a module.
  3. Code: The actual function body of each function in a module.

Optional:

  1. Export: Exposes functions, memory, tables, and global variables to other WebAssembly modules or JavaScript. This allows individual modules to be dynamically linked together. This is the WebAssembly version of.dll.
  2. Import: Specifies functions, memory, tables, and global variables imported from other WebAssembly modules or JavaScript.
  3. Start: Specifies a function that is automatically run when the WebAssembly module is loaded (basically like the main function).
  4. Global: Declares Global variables for a module.
  5. Memory: Defines the Memory used by the module.
  6. Table: Provides the ability to map to values outside the WebAssembly module, for example to JavaScript objects. This ability is very useful in indirect function calls.
  7. Data: initializes the imported memory or local memory.
  8. Element: Initializes the imported table or local table.

For more information on sections, here is an article that explains in depth how these sections work.

The next step

Now that you know how to use the WebAssembly module, let’s take a look at why WebAssembly is so fast.