This article is translated

A Crash Course in Assembly

Originally written by Lin Clark

The original address: hacks.mozilla.org/2017/02/a-c…

This is the third installment in the WebAssembly series. If you haven’t read the others, I recommend youFrom the very beginning.

By understanding “what assembly is” and “how the compiler generates assembly code,” we can better understand how WebAssembly works.

In the article on JIT, I mentioned that communicating with a machine is like communicating with an alien.

I want to talk now about how the alien brain works — how the machine brain interprets and makes sense of the information it receives in communication.

A part of a machine’s brain is devoted to addition, subtraction, or logical operations. In addition, there are sections for providing short-term and long-term memory.

We’ve given names to these different parts.

  • The Arithmetic logic Unit is called the Arithmetic logic Unit.
  • The part that provides short-term memory is called the registers.
  • The part that provides long-term Memory is called Random Access Memory, or RAM.

The statements in machine code are called instructions.

What happens when a command enters the machine’s brain? It will be divided into different parts according to function.

The way a machine’s brain thinks determines how instructions are split.

For example, there is a machine brain that always takes out the first six bits and transmits them to the ALU. Based on the position of the 0 and 1, ALU recognized that the six bits meant “add two things together.”

These six bits are called operation codes (Opcodes for short), and as the name suggests, the ALU performs operations based on them.

After that, the machine brain extracts two consecutive pieces of content, each three bits. These two pieces of content represent the address in the register of the number to be added.

Note the notes above the machine code in the figure, which help us humans understand what is going on. This is assembly. It is called symbolic machine code. This is one way for humans to understand machine code.

For the machine in this example, we can see a fairly direct relationship between machine code and assembly. In fact, there are different kinds of assembly for different machine architectures. When a machine contains different architectures, it may require separate assembly dialects.

That is to say, our translation has more than one goal. There is not just one language called machine code, there are many kinds of machine code. Just as humans can speak different languages, machines can also speak different languages.

In the process of translation from human language to alien language, it may be from English, Russian or Chinese to alien language A or alien language B. In the programming world, this is like translating from C, C++, or Rust to x86 or ARM.

In order to be able to translate any high-level language into any assembly language (for different machine architectures), we can create a whole bunch of different one-to-one compilers.

It must be rather inefficient. To solve this problem, most compilers add at least one intermediate layer between higher-order languages and assembly languages. The compiler translates higher-order language code into a language that falls somewhere between higher-order language and assembly. This is called Intermediate Representation, or IR.

This means that the compiler can translate any higher-order language into an IR language (intermediate language). Another part of the compiler can take these IR languages and compile them into assembly code for a specific target architecture.

The front end of the compiler translates higher-level languages into IR. The back end of the compiler translates from IR to assembly code for the target architecture.

conclusion

In this article I’ve explained the concept of assembly and how compilers translate from higher-level languages into assembly code. In the next article, I’ll show you how WebAssembly interacts with these processes.