The origin of

WebAssembly began as a side project for Mozilla employees. In 2010, Alon Zakai, who worked on Android Firefox at Mozilla, created a compiler called Emscripten in his spare time to translate his old game engine into a browser. You can compile C++ code into JavaScript code through LLVM IR.

By the end of 2011, Emscripten had even managed to compile large C++ projects such as Python and Doom, and Mozilla felt the project was so promising that it formed a team and asked Alon to work on it full-time. In 2013, Alon and others proposed the ASM.js specification, a strict subset of the JavaScript language that tries to help browsers improve their JavaScript optimization space by “reducing dynamic features” and “adding type hints.” Compared to the full JavaScript language, the tailored ASM.js is closer to the bottom and better suited as a compiler target language.

Asm.js provides only two data types: 32-bit signed integers and 64-bit signed floating-point numbers. Asm.js does not provide other data types, such as strings, bores, or objects, which are stored in memory as values and called via TypedArray. Type of statement also have fixed writing: variable | 0 means integer, + variable said floating point number. For example, the following code:

function MyAsmModule() {
    "use asm";  // Tell the browser that this is an asm.js module
    function add(x, y) {
        x = x | 0;  / / variable | 0 means integer
        y = y | 0;
        return (x + y) | 0;
    }
    return { add: add };
}
Copy the code

Engines that support ASM.js recognize types ahead of time and can perform aggressive JIT (just-in-time) optimizations and even AOT (pre-compiled) compilations, dramatically improving performance. Asm.js execution as normal JavaScript code is not supported and the results will not be affected.

However, the disadvantages of ASM.js are also obvious, that is, the “low-level” is not thorough enough, such as the code is still in text format; Coding is still constrained by JavaScript syntax; The browser still has to parse the script, interpret execution, collect performance metrics, JIT compilation, and so on. Using a binary format like Java class files not only reduces file size, reduces network transfer time and parsing time, but also makes it easier and better for AOT/JIT compilers to implement bytecode that is closer to the machine.

Google’s Chrome team, meanwhile, is also trying to address JavaScript performance issues, but in a different direction. Chrome offers solutions such as NaCl (Google Native Client) and Portable NaCl (PNaCl). With NaCl/PNaC1, Chrome can execute native code directly in a sandbox environment.

Asm. js and NaCl/PNaC1 have their own advantages and disadvantages. Mozilla and Google have also seen this, so the two teams have been communicating and collaborating frequently since 2013. Then they decided to combine the strengths of the two projects and collaborate on a bytecode-based technology. In 2015, WebAssembly was officially named and made public, and the W3C formed the WASM Community Group (Chrome, Edge, Firefox, and WebKit) to promote WASM technology.

Support for WASM began in Rust 1.14, which was released in 2016. In 2017, Google decided to abandon PNaCl technology; WASM is now supported in the updated versions of Chrome, Edge, Safari, and Firefox. In 2018, Go 1.11 was released and WASM support began. In 2019, Emscripten was updated to use LLVM as the default for compiling to WASM code, discontinuing support for ASM.js; WebAssembly became a recommendation of the World Wide Web Consortium (W3C), joining HTML, CSS and JavaScript as the fourth language of the Web.

Introduction to the

Official definition: WebAssembly/WASM is a binary instruction set based on stack virtual machines. It can be used as a programming language compilation target and can be deployed in Web client and server applications.

WebAssembly has the following features:

  • Is an underlying assembler like language that runs at near-native speed on all modern desktop browsers and many mobile browsers.
  • Files are designed to be compact, so they can be quickly transferred and downloaded. These files are also designed in such a way that they can be quickly parsed and initialized.
  • Designed as a compilation target, code written in C++, Rust, and other languages can now run on the Web.

That is, WebAssembly allows code written in any language to run in a browser at near-native speed.

WebAssembly is also designed to co-exist and work with JavaScript, addressing several issues over JavaScript (including ASM.js) :

  • Performance improvement. Because WebAssembly is a low-level assembly like language, the code is statically typed, and the browser can directly compile it into machine code to greatly improve performance. Since WebAssembly is a bytecode format and files are small enough to transfer quickly over the Internet, browser manufacturers have even introduced “stream compilation” technology, which allows files to be compiled as they are downloaded and initialized as soon as they are downloaded.
  • Blend different languages. Previously, if you wanted to execute other languages on the Web, you had to convert them to JavaScript, which was not easy and resulted in significant performance degradation. WebAssembly was designed to compile in a target language, making it easy for other languages to convert to WebAssembly code without worrying about performance (though there is still a cost), and making it easy to reuse code.
  • Enhance code security. Protecting JavaScript code can often only be obfuscated to significantly reduce code readability, but it can still be read with the help of tools that take a little more time. However, the translated WASM code is completely unreadable, and even if decomcompiled by tools like WASM2C, it is still much more difficult than analyzing JS code (not completely safe, of course, but much less risky by adding reverse difficulty).

However, WebAssembly is not a pure browser platform technology, like JavaScript and Node.js, it now has its own Runtime, and has many applications outside of the browser in cloud native, blockchain, security and other systems applications.

compile

C/C++ compiled via Emscripten:

emcc hello.c -o hello.wasm
Copy the code

Rust compiled by Cargo:

cargo build --target wasm32-example --release
Copy the code

The volume can be further compressed:

wasm-gc target/wasm32-example/release/hello.wasm
Copy the code

Golang built in:

GOARCH=wasm GOOS=js go build -o hello.wasm main.go
Copy the code

run

In the JavaScript to run

To run WebAssembly in JavaScript, you first need to put the module into memory before compiling/instantiating, such as via XMLHttpRequest or Fetch, which will be initialized as a typed array.

Examples of using Fetch:

fetch('module.wasm').then(response= >
  response.arrayBuffer()
).then(bytes= >
  WebAssembly.instantiate(bytes, importObject)
).then(results= > {
  result.instance.exports
});
Copy the code

The way to do this is to create an ArrayBuffer containing your WebAssembly module binaries and then compile it using webAssembly.instantiate ().

. You can also use WebAssembly instantiateStreaming (), the method directly obtained directly from the original bytecode, compile and instantiation module, do not need to convert ArrayBuffer:

WebAssembly.instantiateStreaming(fetch('simple.wasm'), importObject)
.then(result= > {
  result.instance.exports
});
Copy the code

WebAssembly plans to support direct loading of

Run outside the browser

The Wasm community provides a number of Runtime containers that allow Wasm to run on systems outside of the browser and in a sandboxed environment.

The most popular Runtime:

  • Wasmtime: Can be used as a CLI or embedded in other applications such as IoT or cloud native
  • WebAssembly Micro Runtime: A more chip-oriented virtual machine, as its name suggests, is very small, with a starting speed of 100 microseconds and a minimum memory consumption of 100KB
  • Wasmer: Features support for running WASM instances in more programming languages and its own package management platform, Wapm
  • WasmEdge: Formerly known as SSVM, WasmEdge is optimized for cloud native, edge, and decentralized applications

The underlying concept

The module

The main unit of a WebAssembly program is called a Module, a term used to refer to both the binary version of the code and the compiled version in the browser.

A large WebAssembly application usually consists of multiple submodules, each of which has its own independent data resources. Therefore, submodules cannot tamper with the data of other modules. In addition, the permissions that each module can use are specified by the uppermost caller, so a third party sub-module cannot be called without the awareness of the upper module. This kind of permission management is similar to the way that Android development needs to declare all dependent permissions in advance.

When compiled into WebAssembly, other high-level languages become a module binary with file names ending in the.wasm suffix and file contents beginning with 8-byte descriptive module headers:

0000000: 0061 736d              ; WASM_BINARY_MAGIC
0000004: 0d00 0000              ; WASM_BINARY_VERSION
Copy the code

The first four bytes are called the “Magic Number”, corresponding to the \ 0ASM string, which identifies this as a Wasm module; The last 4 bytes are the WASM standard version number used by the current module.

Period of

After the module header comes the body content of the module, which is grouped into sections. Wasm puts specific functions or associated code into a specific Section, some of which is required by any module, and some of which is optional.

Segments can contain multiple items, and the Wasm specification defines 12 segments and assigns an ID to each segment. Except for the custom segment, all segments can appear at most once and must appear in the ascending order of segment IDS.

Here is a description of each section, where bold is required:

ID Period of instructions
0 Custom section (Custom) It is used to store debugging information
1 Type (Type) Store lists of function arguments for imported functions and functions inside modules
2 The Import section (Import) The function name and parameter index used to store the imported function
3 Function section (Function) Used to store functional index values
4 Form (Table) For storing object references, function Pointers can be implemented through table segments (call_indirectDirective), which can be imported from an external host and exported to an external host environment
5 Memory (Memory) Runtime dynamic data for storing a program can be imported from an external host or exported to an external host environment
6 Global section (Global) Used to store all variable values
7 Export section (Export) The function name, function parameter index used to store the exported function
8 Start period (Start) Used to specify the function index value at module initialization
9 Element section (Elem) The table segment is not explicitly initialized, and the element segment is used to store the index value of the function
10 Code snippet (Code) The instruction code used to store a function
11 Data segment (Data) Used to store static data in initialized memory

The data type

The data types of WASM in binary encoding are as follows:

  • An unsigned integer. Three types of non-negative integers are supported: uint8, Uint16 and uint32. The numbers behind them indicate how many bits are occupied
  • An unsigned integer of variable length. Three types of non-negative integer with variable length are supported: VARUint1, VARUint7 and VARuint32. The variable length means that the number of bits is determined according to the specific data size. The following number indicates the maximum number of bits that can be occupied
  • Signed integer of variable length. As above, negative numbers are allowed, and three types are supported: VARint7, varint32, and varint64
  • Floating point number. The same as JavaScript, using ieEE-754 scheme, single precision of 32 bits

For the language itself, the following numeric types are provided:

  • I32:32 – bit integer
  • I64:64 – bit integer
  • F32:32-bit floating point type
  • F64:64-bit floating point type

Each parameter and local variable must be one of the above four value types. The function signature consists of a type sequence of 0 or more parameters and a type sequence of 0 or more return values. In the least viable version, a function can have at most one return type. Note that the value types i32 and i64 are not inherently signed or unsigned. The interpretation of these types depends on a specific operator.

Boolean values are represented as unsigned 32-bit integers with 0 being false and non-0 being true. All other value types, such as strings, need to be represented in the module’s linear memory space.

WAT

WASM binary files are unreadable. WAT (WebAssembly Text Format) is another output Format that uses s-expression Text Format and can be approximately understood as assembly language equivalent to binary.

The developer tools in some browsers support converting WASM to WAT for online debugging. The community provides mature tools such as WASM2WAT and WAT2WASM to convert them, which can be found in the WABT (WebAssembly Binary Toolkit) tool set. Therefore, it is also possible to write WAT and convert it to WASM directly.

WASI

Just because WebAssembly was built for the Web doesn’t mean it can and won’t run exclusively in browsers. Developers want to push it beyond the browser, which requires an interface to interact with the operating system.

Because WebAssembly is assembly language based on conceptual machines, rather than physical machines, WebAssembly provides a fast, extensible, and secure way to run the same code on all computers. And in order to run on all different operating systems, WebAssembly needs a system interface to the concept machine, not any single operating system. So the developers defined a unified standard for communicating with different operating systems called WASI (WebAssembly System Interface), It is an engine-indepent, non-Web system-oriented API standard specifically designed for WASM.

WASI’s design follows two principles:

  • Portability. Ability to compile portable binaries that run on different computers once compiled, making it easier for users to distribute code. For example, Node’s native modules, if written in WebAssembly, do not need to be run when users install applications with native modulesnode-gypDevelopers don’t have to configure and distribute dozens of binaries.
  • Security. When a line of code asks the operating system to perform some input or output, the operating system needs to determine whether the action requested by the code is safe. WebAssembly uses a sandbox mechanism where code cannot interact directly with the operating system, and the host machine (be it a browser or a WASM runtime) needs to put related functions into the sandbox that the code can use. The host machine can limit what each program can do one by one. While having a sandbox doesn’t make the system itself safer (the host can still put all its capabilities into the sandbox), it at least gives the host the option of creating a more secure system.

Based on these two key principles, WASI is designed as a modular set of standard interfaces, in which the most basic core module is WASI-Core, and other sub-sets such as Sensors, Crypto, Processes, Multimedia, etc. are organized as separate sub-modules.

Wasi-core contains the basic interfaces that all programs need. It covers nearly the same areas as POSIX, including wasI abstract function interfaces for related system calls such as files, network connections, clocks, and random numbers.

WASI adds a “system call abstraction layer” between WASM bytecode and virtual machines. For example, for the fopen function used in C/C++ source code, when we compile this part of the source code with wasI-libc, the C standard library specifically implemented for WASI, the source code calls fopen function process, Internally, this is done indirectly by calling a function named __wasi_path_open. The __wasi_path_open function is an abstraction of the actual system call.

The main work of WASI is to define the Import interface standard and provide a concrete implementation of the common Import interface on different systems (similar to implementing the LIBC pattern on different operating systems). Based on the design idea of WASI, we can also provide a higher level WADSI (WebAssembly Domain Specific Interface) for different domains. The Domain general Interface is provided as Import Interface, so that developers can use it directly.

security

One source of security for WebAssembly is that it was the first language to share a JavaScript VM, which was sandboxed at runtime and had undergone years of validation and security testing to ensure it was secure. WebAssembly modules are accessible within the range of JavaScript and follow the same security rules, including enhanced rules such as the Same-Origin policy.

Unlike desktop applications, WebAssembly modules do not have direct access to device memory, but rather the runtime environment passes an ArrayBuffer to the module during initialization. The module uses the ArrayBuffer as linear memory, and the WebAssembly framework performs checks to ensure that the code does not overstep its bounds on the array.

Items stored in the Table segment, such as function Pointers, are also not directly accessible by the WebAssembly module. The code uses the index value to make a request to the WebAssembly framework to access an item. The framework then accesses memory and executes the project on behalf of the code.

In C++, the execution stack is in memory along with linear memory, and while C++ code should not modify the execution stack, it can do so using Pointers. The execution stack of WebAssembly is separated from linear memory and cannot be accessed by code.

The application case

Google Earth was released in version 9.0 in 2017 and was developed with NaCl technology, so it only ran on Chrome at that time. In 2020 Google rewrote the project via WebAssembly using C++, and it has since been able to run on Firefox and Edge.

AutoCAD is a well-known desktop design software with a history of nearly 40 years. It is widely used in civil architecture, decoration, industrial drawing and other fields. The Web version of AutoCAD was released in 2014 with the help of Google Web Toolkit, a Set of tools developed by Google for developing Web applications using the Java language. The Java code of Android terminal is translated into JS code, but the JS code generated is very large, resulting in low efficiency on the browser. In 2015, the main functions of the original C++ code were directly compiled and transplanted to the Web platform through asm.js, and the performance was greatly mentioned. In March 2018, AutoCAD Web based on WASM was successfully born.

Figma Figma is a browser-based collaborative UI design tool. The core interface is hosted in a Canvas, and the Canvas interaction is controlled by WASM. Being browser-based makes it easy to run across platforms, while WebAssembly brings high performance that makes it faster than its native OS counterparts even on the Web.

conclusion

As you can see, WebAssembly is not intended to completely replace JavaScript, but rather to complement Web technologies, making up for the limitations of JavaScript in terms of performance and code reuse. As WASM’s official slogan goes, “Everything that can be done with WebAssembly will be done with WebAssembly.” The ultimate goal of WebAssembly is to be compiled in any language and run efficiently on any platform. Most importantly, it is backed by the support of mainstream developers such as Google, Mozilla, Edge, and so on. I believe it will definitely have further development in the future.

The resources

  • WebAssembly principles and core technologies
  • WebAssembly of actual combat
  • WASI in standardization: A system interface to run WebAssembly outside of the Web
  • Create and use the WebAssembly module
  • WebAssembly | MDN