WebAssembly is a project by Google, Microsoft, Mozilla, Apple, and others that aims to create a common binary and text format for the Web. Now let’s demystify WebAssembly step by step and get your hands on WebAssembly in real business.
1. The introduction
As you know, whether it’s Chrome, Firefox, Safari, Edge, or any other browser, the language that works is Javascript. WebAssembly was created to enable code from other languages to run in the browser, as described in The WebAssembly Advanced Series 2: What a WebAssembly Is. We don’t need to write WebAssembly code ourselves, all we need to do is compile into WebAssembly from other high-level languages, so we can reuse a lot of existing code from other languages. WebAssembly also has better performance than JavaScript, loading and executing faster.
So why is WebAssembly so much better than JavaScript? The exact reason depends on which part of the compilation phase they are in.
2. Compilation steps
As programmers, we write source code every day in a variety of high-level languages. But in order for the machine to understand the string code, the compilation system has to compile it step by step into object code.
(1) Precompile
Precompilation first processes precompiled instructions in source code that start with # (such as #include, #define, etc.), making adjustments to the original code file before compilation begins. After precompiling, the code you write has actually changed a lot.
(2) Lexical analysis
Lexical analysis is the process of converting sequences of characters into sequences of words (tokens) in computer science. The program or function that performs lexical analysis is called lexical analyzer (lexer for short), also known as Scanner, and is called by the parser.
The lexical analysis stage is the first stage of the compilation process. The task is to scan the source code line by line in a character stream from left to right, and then identify keywords, symbols, literals, operators, etc., one by one according to the word-formation rules, and divide them into Token tokens in order.
(3) Grammatical analysis
Grammatical analysis is a process of analyzing input text composed of word sequences (such as English word sequences) and determining its grammatical structure according to a given formal grammar. The program or function that performs parsing is called a parser and is called for semantic parsing.
The grammar analysis stage is a logical stage in the compilation process. The task is to combine the Token sequences into various grammatical phrases such as “program”, “statement”, “expression”, etc., based on the Token sequences separated from the lexical analysis.
(4) Semantic analysis
Semantic analysis is one of the most substantial stages of the compilation process. The task is to review the context of the properties of the structurally correct source code, and to conduct type checks.
In fact, after grammar analysis, we can get the abstract syntax tree preliminarily, and each node in the tree is an expression, but it is not sure whether it is meaningful at this time. Therefore, it is necessary to traverse the whole abstract syntax tree through semantic analysis, identify the type of the expression of each node, and verify whether it is legal.
(5) Abstract syntax tree
Abstract Syntax Tree (AST) is an Abstract representation of source code Syntax structure. It represents the syntactic structure of a programming language as a tree, with each node representing a structure in the source code, as shown below:
From the start of parsing, a preliminary abstract syntax tree has been generated. After semantic analysis, the abstract syntax tree becomes more perfect, and the final version of the abstract syntax tree AST has been built.
(6) Intermediate code
After the first few phases, we have the final AST. However, the AST does not run perfectly on all hardware platforms because assembly processing is different on different platforms.
Thus, an Intermediate Representation is abstracted between the AST and the assembly code of multiple platforms, and the design of the Intermediate Representation smoothes out the differences caused by the hardware platform. The power of intermediate code is that it is cross-platform and language independent.
(7) Object code
Computers use assembly instructions to perform operations such as moving to an address in memory, how many bytes to move, and so on. Therefore, it is necessary to generate the assembly code of the target platform through the assembler of the target platform and the intermediate code converted from AST, which is the object code that the machine can understand.
3. Compile front end and back end
In fact, we can regard the intermediate code as a dividing line. The link before the intermediate code is called the front end of compilation, and the link after the intermediate code is called the back end of compilation.
(1) Compile front end
The compilation front end includes precompilation, lexical analysis, syntax analysis, semantic analysis, abstract syntax tree, etc., which is specially used to deal with language-specific features. Although different language lexical keywords, rules of grammar and semantic analysis of function type check may be different, and even some languages are no precompiled this link, but each language can develop a compiler front-end, in accordance with the unified standard generated intermediate code can seamless docking brought any compiled, it is the language.
(2) Compiling the back end
The compile backend contains only object code generation, but should also include linking object code into an executable. The compiler backend is specially responsible for dealing with the differences of each platform. According to the standard intermediate code generated by different languages, the corresponding object code is generated, which is platform independent.
4. Compilation tools
(1) the GCC
GCC (GNU Compiler Collection) contains all modules of the front-end and back-end of compilation. Among them, the compilation front part supports C, C++, Fortran, Pascal, Objective-C, Java and other languages, and the compilation back end supports more than 30 platforms such as x86, MIPS, Alpha, ARM, AVR, IA-64, SPARC, PowerPC and so on.
Although GCC is widely used, it is also facing a crisis. Clang/LLVM, a rising star, has the momentum to catch up with GCC in an all-round way.
(2) Clang/LLVM
Clang is a C++ -written, LLVM-based C/C++ / Objective-C/Objective-C++ compiler published under the LLVM BSD license. So why develop Clang when you already have GCC? What are the advantages of Clang over GCC? Because Clang is a lightweight compiler that is highly modularized, it compiles quickly, takes up little memory, and is very convenient for secondary development.
The Clang compiler front end is not enough, so it is combined with the LLVM compiler back end to form a complete compiler suite, as shown below:
5. Location of WebAssembly during compilation
What part of WebAssembly is being compiled? Impatient can not hot tofu, everything to have a certain understanding of the basic knowledge, in order to understand.
You can see where WebAssembly is in the picture below. Like Java bytecode, it runs everywhere at once and is cross-platform. At the same time, WebAssembly, as an intermediate code, bypasses the front end of compilation, whereas JavaScript requires real-time compilation, which gives it a significant performance advantage.
6. Write at the end
Without WebAssembly, HTML, CSS, and JavaScript will become the de facto assembly languages of the front-end realm, and browsers will eventually use them as “compilation targets.” The advent of WebAssembly offers a better option: near-native computing speed, open source, compatibility, cross-platform coverage, and a chance to throw away the legacy of JavaScript. Why not?
The resources
- WebAssembly: The Silver Bullet to solve JavaScript’s ills?
- WebAssembly’s past, present, and future
- Bullshit: Talk about compiling in plain English
- What are the technical advantages of LLVM over the JVM?
- How does the compiler generate assembly
- Getting Started with WebAssembly: Bringing bytecode to the Web world
- How does JavaScript work: compare to WebAssembly and its usage scenarios
- LLVM compilation principle and usage