Marty Kalin

Translation: Crazy geek

Original text: opensource.com/article/19/…

Reproduced without permission

There is a technique for converting non-Web programs written in high-level languages into Web-prepared binary modules without making any changes to the Web program’s source code. The browser can effectively download the newly translated module and execute it in the sandbox. Executed Web modules can seamlessly interact with other Web technologies – specifically JavaScript (JS). Welcome to WebAssembly.

For languages with an Assembly in their name, WebAssembly is low-level. But this low-level role encourages optimization: the just-in-time (JIT) compiler of the browser virtual machine converts portable WebAssembly code into fast, platform-specific machine code. As a result, the WebAssembly module becomes an executable suitable for computing bound tasks, such as number calculations.

There are many high-level languages that can be compiled into WebAssembly, and the list is growing, but the initial candidates are C, C ++, and Rust. We call these three system languages because they are used for system programming and high performance application programming. System languages all have two features that make them suitable for compiling into WebAssembly. The next section details complete code examples for setting up (using C and TypeScript) and examples from WebAssembly’s own text formatting language.

Explicit data types and garbage collection

These three system languages require explicit data types, such as int and double, for variable declarations and values returned from functions. For example, the following code snippet illustrates 64 bit addition in C:

long n1 = random();
long n2 = random();
long sum = n1 + n2;
Copy the code

The library function random declares a return type of long:

long random(a); /* returns a long */
Copy the code

During compilation, C sources are translated into assembly language, which is then translated into machine code. In AT&T Flavor, the last C statement above does something like the following (## is an assembly language comment symbol) :

addq %rax, %rdx ## %rax = %rax + %rdx (64-bit addition)
Copy the code

% rax and % RDX are 64-bit registers, and the addQ instruction means Add quadwords, where quadword is the 64-bit size, which is the standard size for long in C. Assembly language emphasizes that executable machine code involves types, given by a mixture of instructions and parameters (if any). In this case, the add instruction is ADDQ (64-bit addition), not an instruction like addl, which adds the 32-bit value of int typical of C. Register word lengths used are full 64 bits (% rax and % RDX) rather than their 32 bits (for example, % eax is the lower 32 bits of % rax and % edx is the lower 32 bits of % RDX).

Assembly language works well because the operands are stored in CPU registers, and a reasonable C compiler (even the default optimization level) would generate the same assembly code as shown here.

The emphasis on explicit typing in all three system languages is ideal for compiling into WebAssembly because the language also has explicit data types: I32 for 32-bit integer values, F64 for 64-bit floating point values, and so on.

Explicit data types also encourage optimization of function calls. Functions with explicit data types have signature, which specifies the data type of the parameter and the value (if any) returned from the function. Below is the signature of a WebAssembly function called **$add**, written in the WebAssembly text formatting language discussed below. This function takes two 32-bit integers as arguments and returns a 64-bit integer:

(func $add (param $lhs i32) (param $rhs i32) (result i64))
Copy the code

The browser’s JIT compiler should have 32-bit integer arguments and store the returned 64-bit value in a suitably sized register.

When it comes to high-performance Web code, WebAssembly is not the only choice. For example, ASm.js is a JS dialect that, like WebAssembly, can be close to native speed. The ASM.js dialect allows optimization because the code mimics explicit data types in the three languages above. This is an example of C and am.js. Example functions in C are:

int f(int n) {       /** C **/
  return n + 1;
}
Copy the code

Both the argument n and the return value are entered explicitly as int. The equivalent function of ASm.js is:

function f(n) {      /** asm.js **/
  n = n | 0;
  return (n + 1) | 0;
}
Copy the code

In general, JS has no explicit data type, but bitwise or operators in JS can produce an integer value. This explains the seemingly meaningless bitwise or operators:

n = n | 0;  /* bitwise-OR of n and zero */
Copy the code

The bitwise or operation between n and 0 yields n, but the purpose here is to indicate that n holds an integer value. The return statement repeats this optimization technique.

Among JS dialects, TypeScript stands out in terms of explicit data types, which makes the language attractive for compiling into WebAssembly. (The following code example illustrates this.)

A second feature of all three system languages is that they execute without a garbage collector (GC). For dynamically allocated memory, the Rust compiler automatically allocates and frees code; In the other two system languages, the programmer who allocates memory dynamically is responsible for explicitly freeing it. The system language avoids the overhead and complexity of automated GC.

The overview of WebAssembly can be summarized as follows. Almost every article on the WebAssembly language mentions near-native speed as one of the main goals of the language. Native speed refers to the speed of compiled system languages, so these three languages were originally designated as candidates for compilation to WebAssembly.

WebAssembly, JavaScript and separation of concerns

The WebAssembly language is not intended to replace JS, but to complement it by providing better performance in computational binding tasks. WebAssembly also has advantages in downloads. Browsers extract JS modules as text, which is one of the inefficiencies WebAssembly can address. Modules in WebAssembly are compact binary formats that speed up downloads.

Also interesting is how JS and WebAssembly work together. JS is designed to read into the Document Object Model (DOM), a tree representation of a web page. In contrast, WebAssembly does not provide any built-in functionality for the DOM, but WebAssembly can export functions that JS calls as needed. This separation of concerns implies a clear division of labor:

DOM<----->JS<----->WebAssembly
Copy the code

JS should manage the DOM in any dialect, but JS can also use the generic functionality provided through the WebAssembly module. Code examples help illustrate, and the code examples in this article can be found on my website (condor.depaul.edu/mkalin).

Hailstone sequence and Collatz conjecture

The production-level code case would have The WebAssembly code perform heavy computational binding tasks, such as generating large cryptographic key pairs, or encrypting and decrypting.

Consider the function hstone (for hailstone), which takes a positive integer as an argument. This function is defined as follows:

             3N + 1 if N is odd
hstone(N) =
             N/2 if N is even
Copy the code

For example, hstone(12) returns 6, while hstone(11) returns 34. If N is odd, 3N + 1 is even; But if N is even, then N/2 can be even (e.g., 4/2 = 2) or odd (e.g., 6/2 = 3).

The hstone function can iterate by passing the return value as the next argument. The result is a hailstone sequence, such as this one, which starts with 24 as the original argument, returns 12 as the next argument, and so on:

24,12,6,3,10,5,16,8,4,2,1,4,2,1,...
Copy the code

The sequence converges to 4,2, and 1 with infinite repetition requires 10 calls :(3 x 1) + 1 is 4, which is divided by 2 to get 2, and divided by 2 to get 1. Plus magazine offers an explanation of why these sequences are called hailstones.

Notice that these two powers converge very quickly, you just divide N by 2 to get 1; For example, if 32 = 25, the convergence length is 5. If 64 = 26, the convergence length is 6. What is of interest here is the length of the sequence from the initial parameter to the first occurrence. My code examples in C and TypeScript calculate the length of hail sequences.

The Collatz conjecture is that a hail sequence will converge to 1, whatever the initial value N> 0 happens to be. No one has found a counterexample to Collatz’s conjecture, nor has anyone found evidence to elevate the conjecture to a theorem. This conjecture is as simple as testing it programmatically, a challenging problem in mathematics.

From C to WebAssembly in one step

The following hstoneCL program is a non-Web application that can be compiled using a regular C compiler (for example, GNU or Clang). The program generates a random integer value N> 0 eight times and calculates the length of the hail sequence starting from N. Two programmer-defined functions, main and hstone, make sense. The application is later compiled into WebAssembly.

The hstone function in example 1.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int hstone(int n) {
  int len = 0;
  while (1) {
    if (1 == n) break;           /* halt on 1 */
    if (0 == (n & 1)) n = n / 2; /* if n is even */
    else n = (3 * n) + 1;        /* if n is odd */
    len++;                       /* increment counter */
  }
  return len;
}

#define HowMany 8

int main(a) {
  srand(time(NULL));  /* seed random number generator */
  int i;
  puts(" Num Steps to 1");
  for (i = 0; i < HowMany; i++) {
    int num = rand() % 100 + 1; /* + 1 to avoid zero */
    printf("%4i %7i\n", num, hstone(num));
  }
  return 0;
}
Copy the code

Code can be compiled and run from the command line (% is the command line prompt) on any Unix-like system:

% gcc -o hstoneCL hstoneCL.c  ## compile into executable hstoneCL
% ./hstoneCL                  ## execute
Copy the code

Here is the output from the example run:

  Num  Steps to 1
  88      17
   1       0
  20       7
  41     109
  80       9
  84       9
  94     105
  34      13
Copy the code

System languages, including C, require specialized toolchains to convert source code into WebAssembly modules. Emscripten is a pioneering and still widely used option for the C/C++ language, building on the well-known LLVM (low-level virtual Machine) compiler infrastructure. My example in C uses Emscripten, which you can install using this guide (github.com/emscripten-…

HstoneCL programs can be weblized using Emscription compiled code without any changes. The Emscription toolchain also works with JS Glue (in ASM.js) to create an HTML page that is intermediate between the DOM and the WebAssembly module that calculates hstone functions. Here are the steps:

  1. Compile a non-Web program hstoneCL into WebAssembly:

    Copy the code

% emcc hstoneCL.c -o hstone.html ## generates hstone.js and hstone.wasm as well

The file * hstonecl.c * contains the source code shown above, and the **-o** * output * flag is used to specify the name of the HTML file. Any name is fine, but the generated JS code and the WebAssembly binary have the same name (in this case, *hstone.js* and *hstone.wasm*). Older versions of Emscription (pre-13) may need to be outlawed-sWASM = 1** is included in the compile command. 2. Adopting Emscription Web server development (or equivalent) to host Web applications: bash % emrun --no_browser --port 9876.## . is current working directory, any port number you like
Copy the code

To disable the display of warning messages, you can include the flag –no_emrun_detect. This command is used to start the Web server, which hosts all resources in the current working directory. In particular, hstone.html, hstone.js and hstone.webasm.

  1. Open the URL with a WebAssembly enabled browser (for example, Chrome or Firefox)http://localhost:9876/hstone.html.

This screenshot shows the sample output I ran with Firefox.

Figure 1. Web-hstone application

The result is remarkable because the complete compilation process requires only one command and does not require any changes to the original C program.

Fine tuning hSTONE program for Web

Emscription toolchain does a good job of compiling C programs into WebAssembly modules and generating the REQUIRED JS glue, but it is typical machine-generated code. For example, the generated asM.js file is almost 100 KB in size. The JS code handles multiple scenarios and does not use the latest WebAssembly API. A simplified version of the Webified Hstone program will make it easier to focus on how the WebAssembly module (in the hstone.wasm file) interacts with the JS glue (in the hstone.js file).

There’s another problem: WebAssembly code doesn’t need to mirror functional boundaries in source programs like C. For example, the C program hstoneCL has two user-defined functions, main and hstone. The generated WebAssembly module exports a function named _ main, but not a function named _ hstone. (It is worth noting that the function main is the entry point in the C program.) The body of a C hstone function may be in some unexported function, or simply contained in _ main. The exported WebAssembly functions are exactly the functions that JS Glue can call by name. But which source language functions should be exported by name in the WebAssembly code.

Example 2. Revised Hstone program

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <emscripten/emscripten.h>

int EMSCRIPTEN_KEEPALIVE hstone(int n) {
  int len = 0;
  while (1) {
    if (1 == n) break;           /* halt on 1 */
    if (0 == (n & 1)) n = n / 2; /* if n is even */
    else n = (3 * n) + 1;        /* if n is odd */
    len++;                       /* increment counter */
  }
  return len;
}
Copy the code

As shown above, the modified hstoneWA program does not have the main function, which is no longer needed because the program is not run as a stand-alone program, but only as a WebAssembly module with a single export function. The EMSCRIPTEN_KEEPALIVE directive (defined in the header file emscripten.h) instructs the compiler to export the _ hstone function in the WebAssembly module. The naming convention is simple: C functions such as hstone keep their names — but use a single underscore as their first character in WebAssembly (_ hstone in this case). Other compilers in WebAssembly follow different naming conventions.

To verify that this method works, you can simplify the compilation steps and only generate WebAssembly modules and JS binders instead of HTML:

% emcc hstoneWA.c -o hstone2.js  ## we'll provide our own HTML file
Copy the code

The HTML file can now be simplified to this handwritten file:


      
<html>
  <head>
    <meta charset="utf-8"/>
    <script src="hstone2.js"></script>
  </head>
  <body/>
</html>
Copy the code

The HTML document loads the JS file, which in turn retrieves and loads the WebAssembly binary hstone2.wasm. By the way, the new WASM file is only half the size of the original example.

The program code can be compiled as before and then launched using the built-in Web server:

% emrun --no_browser --port 7777 .  ## new port number for emphasis
Copy the code

After requesting a modified HTML document in a browser (in this case, Chrome), you can use the browser’s Web console to confirm that the hstone function has been exported as _ hstone. Here is my session in the Web console, with ## as the comment symbol:

> _hstone(27)   ## invoke _hstone by name
< 111           ## output
> _hstone(7)    ## againThe < 16## output
Copy the code

The EMSCRIPTEN_KEEPALIVE directive is a simple way for the Emscripten compiler to generate a WebAssembly module that exports all the functions of interest to the JS glue that the JS programmer also generates. A custom HTML document, whether handwritten JS is appropriate or not, can call functions exported from the WebAssembly module. Hats off to Emscripten for this clean approach.

Compile TypeScript to WebAssembly

The next code example is TypeScript, which is JS with explicit data types. Node.js and its NPM package manager are required for this setup. The following NPM command installs AssemblyScript, which is a WebAssembly compiler for TypeScript code:

% npm install -g assemblyscript  ## install the AssemblyScript compiler
Copy the code

The TypeScript program hstone.ts consists of a single function, also named hstone. Now data types such as i32 (32-bit integer) are followed by parameter and local variable names (n and len, respectively, in this case) :

export function hstone(n: i32) :i32 { // will be exported in WebAssembly
  let len: i32 = 0;
  while (true) {
    if (1 == n) break;            // halt on 1
    if (0 == (n & 1)) n = n / 2;  // if n is even
    else n = (3 * n) + 1;         // if n is odd
    len++;                        // increment counter
  }
  return len;
}
Copy the code

The hstone function takes an argument of type I32 and returns a value of the same type. The body of the function is essentially the same as the body in the C language example. The code can be compiled to WebAssembly as follows:

% asc hstone.ts -o hstone.wasm  ## compile a TypeScript file into WebAssembly
Copy the code

The WASM file hstone.wasm is only 14 KB in size.

To highlight the details of how to load the WebAssembly module, the following handwritten HTML file (found on my website (condor.depaul.edu/mkalin) at index.html) contains the following script: Get and load the WebAssembly module hstone.wasm and then instantiate this module so that you can call the exported hstone function in the browser console for confirmation.

Example 3. HTML pages of TypeScript code


      
<html>
  <head>
    <meta charset="utf-8"/>
    <script>
      fetch('hstone.wasm').then(response =>            <! -- Line 1 -->
      response.arrayBuffer()                           <! -- Line 2 -->
      ).then(bytes =>                                  <! -- Line 3 -->
      WebAssembly.instantiate(bytes, {imports: {}})    <! -- Line 4 -->
      ).then(results => {                              <! -- Line 5 -->
      window.hstone = results.instance.exports.hstone; <! -- Line 6 -->
      });
    </script>
  </head>
  <body/>
</html>
Copy the code

The script elements in the HTML page above can be explained line by line. The FETCH call in line 1 uses the FETCH module to get the WebAssembly module from the Web server hosting the HTML page. When the HTTP response arrives, the WebAssembly module treats it as a sequence of bytes stored in the arrayBuffer at line 2 of the script. These bytes make up the WebAssembly module, which is compiled from TypeScript. File. The module is not imported, as shown at the end of line 4.

Instantiate the WebAssembly module at the beginning of line 4. A WebAssembly module is similar to a non-static class that contains non-static members from an object-oriented language such as Java. This module contains variables, functions, and various supporting components; But like non-static classes, modules must be instantiated to be available, in this case in the Web console, but more often in the corresponding JS glue code.

Line 6 of the script exports the original TypeScript function hstone with the same name. This WebAssembly feature is now available for any JS glue code, as another session in the browser console will confirm.

WebAssembly has a cleaner API for getting and instantiating modules. The new API simplifies the above script to fetch and Instantiate operations. The longer version shown here has the benefit of showing details, specifically representing WebAssembly modules as byte arrays and instantiating objects with export functions.

The plan is for web pages to load WebAssembly modules in the same way as JS ES2015 modules:

<script type='module'>.</script>
Copy the code

JS will then fetch, compile, and otherwise process the WebAssembly module as if it were loading another JS module.

Text format language

WebAssembly binaries can be converted to text equivalents. Binaries typically reside in a file with the WASM extension, while their human-readable text copy resides in a file with the WAT extension. WABT is a set of tools for working with WebAssembly, including tools for converting to WASM and WAT formats. Conversion tools include WASM2WAT, WASM2C and WAT2WASM.

The text formatting language adopts the Lisp generalized S for symbolic syntax. The S-expression (sexPR for short) represents treating the tree as a list with any number of sublists. For example, this sexpr appears near the end of the WAT file in the TypeScript example:

(export "hstone" (func $hstone)) ## export function $hstone by the name "hstone"
Copy the code

The tree represents:

        export        ## root
          |
     +----+----+
     |         |
  "hstone"    func    ## left and right children
               |
            $hstone   ## single child
Copy the code

In text format, a WebAssembly module is a sexPR whose first item is the module, which is the root of the tree. Here is a simple example of a module that defines and exports a single function that takes no arguments but returns the constant 9876:

(module
  (func (result i32)
    (i32.const 9876(a))export "simpleFunc" (func 0)) // 0 is the unnamed function's index
)
Copy the code

This function is defined without a name (that is, as a lambda) and is exported by referring to its index 0, which is the index of the first nested SEXPR in the module. The export name is given as a string; In the current case, its name is “simpleFunc.”

Text-formatted functions have a standard mode and can look like this:

(func <signature> <local vars> <body>)
Copy the code

The signature specifies parameters (if any) and return values (if any). For example, here is the signature of an unnamed function that takes two 32-bit integer arguments and returns a 64-bit integer value:

(func (param i32) (param i32) (result i64)...)
Copy the code

Names can be assigned to functions, parameters, and local variables. Names start with a dollar sign:

(func $foo (param $a1 i32) (param $a2 f32) (local $n1 f64)...)
Copy the code

The body of the WebAssembly function reflects the underlying stack machine architecture of the language. Stack storage is used for registers. Consider an example of a function that doubles its integer argument and returns:

(func $doubleit (param $p i32) (result i32)
  get_local $p
  get_local $p
  i32.add)
Copy the code

Each get_local operation can handle local variables and arguments, pushing 32-bit integer arguments onto the stack. The i32.add operation then pops the first two (currently unique) values from the stack to perform the add. Finally, the sum of the add operation is the unique value on the stack, and thus the value returned by the $doubleit function.

When WebAssembly code is converted to machine code, the WebAssembly stack, as a placeholder, should be replaced as much as possible by general-purpose registers. This is the job of the JIT compiler, which translates the WebAssembly virtual stack machine code into actual machine code.

Web programmers are unlikely to write WebAssemblies in text format because compiling from some high-level language is a very attractive option. By contrast, authors of compiler scripts may find it efficient to work at this level of granularity.

conclusion

WebAssembly’s goal is to achieve near-native speed. But as JS JIT compilers continue to improve, and with the advent and evolution of dialects that are well suited for optimization (e.g., TypeScript), IT is possible for JS to achieve near-native speeds. Does this mean WebAssembly is wasting its energy? I don’t think so.

WebAssembly addresses another traditional goal in computing: meaningful code reuse. As the examples in this article show, code in an appropriate language, such as C or TypeScript, can be easily converted into a WebAssembly module that works well with JS code — the glue that connects the array of technologies used in the Web. So WebAssembly is an attractive way to reuse legacy code and extend the use of new code. For example, a high performance application for image processing that was originally intended as a desktop application may also be useful in a Web application. Then WebAssembly becomes an attractive avenue for reuse. (WebAssembly is a reasonable choice for new Web modules with computational constraints.) My hunch is that WebAssembly will thrive in reuse and performance.

Welcome to pay attention to the front end public number: front end pioneer, receive front-end engineering practical toolkit.