preface

Recently, there have been a lot of questions on my Timline, such as “Why the front End should learn the principles of Compilation” and “How the front End should learn AST”, but I found that no one introduced how the front end should learn the principles of compilation systematically and correctly. Therefore, I will combine my own experience and the detour to share with you snacks and experience, hoping to let you take less detour.

In the end, I’m not the front end, I just happen to be able to write some JavaScript.


directory

Last:

  • Why is compilation hard
  • How to learn programming language well
  • What exactly is the code
  • Re is context-free grammar
  • Programming languages began in earnest with AST

Next:

  • Static analysis
  • Type inference
  • AST transformations
  • Conitnuation
  • Bytecode virtual machine

Why is compilation hard

The first thing you think about compilation is that it’s hard, it’s hard to get started, but why is it hard? To put it bluntly, isn’t the principle of compilation the study of parsing one language and converting it into another? What exactly is holding the technology back? I think the biggest obstacle is the “programming language” itself.

I’m sure those of you reading this already know JavaScript, but do you really know JavaScript? Can you describe the syntax rules of JavaScript? Can you understand the logical structure that grammar refers to? Do you know how JavaScript is interpreted and executed? So, do you really know JavaScript? I still can’t claim to be “proficient” in JavaScript because I don’t know how to implement a JIT.

Most of the time when we refer to ourselves as “proficient” in a programming language, we simply mean proficient in the language, but at the heart of the discipline of compilation principles is the language itself, which requires a deep understanding of how it is constructed and interpreted. It is difficult for us to learn this subject well without this foundation.

Recommended reading:

  • Seven Languages in Seven Weeks — what do different programming languages look like
  • Programming language — Wikipedia
  • Programming paradigm — Wikipedia

How to learn a programming language

Some computer majors in foreign universities like to use Lisp Scheme to get started. At first, I didn’t understand why, until I found that their course assignments always end up requiring the implementation of a simple Lisp interpreter. The way foreign schools arrange their courses is brilliant. The Scheme is not for students to write engineering code, but for students to learn about programming and what the programming language itself is.

Lisp is almost the simplest implementation of a modern programming language, and it’s really more than a joke that all programming languages are Lisp dialects. Simple Lisp is very easy to interpret. Parsing the Lisp syntax is only as difficult as parsing JSON, and it is common to see novice Lisp interpreters with a few hundred lines of code. Although it is not difficult to implement a Lisp interpreter, it is very important for students to have a very, very basic but very comprehensive understanding of the construction and execution of programming languages and programs. This comprehensive understanding of programming languages is what those of us who have started with C/C fuck or JavaScript are missing.

So how to learn a programming language well? Right, of course, god is on our classic book “SICP, but considering the SICP serious textbooks properties, speak not interesting, so you may as well give you recommend a popular science nature more books, called” calculating the essence of “everybody can use the introduction to this book first, to study if the spare capacity or interested to chew the SICP.

Recommended reading:

  • The Nature of Computing: In – depth Analysis of Programs and Computers – Popular science edition SICP
  • The construction and Interpretation of Computer programs – a classic book
  • “Micro channel small program also want to forcibly heat more code, goose factory refuse to accept you to anal me” – I wrote my own article, with JavaScript to achieve a JavaScript interpreter

What exactly is the code

As we mentioned in the last section, Lisp is just as difficult to parse as JSON, so can we just use JSON instead of code? Sure, JavaScript’s parse syntax tree is represented in JSON. So there is no difference between JavaScript code and JSON in terms of expressiveness. So the question is, what is code?

The code, like JSON, is a structured text data format. Here we will focus only on two characteristics — “text” and “structure”.

The first feature of the code is text, which means that all of our concatenation, interception, or substitution of strings can be applied to the code. Many programmers know how to read and write various types of text, but they don’t seem to realize that code files can also be one of those files that can be read, written and modified.

Reading, writing and manipulating code files is the first important threshold to enter the compilation world. Sometimes it is not necessary to have complex algorithms to do some meaningful transformations to code. For example, we can implement a simple Webpack directly through the regular analysis import/export/require. For example, in my previous article, tail recursion code was optimized with simple re. By consciously treating code files as text files, we can dethrone code and allow people to think of code as text.

The second feature of code is structure. I don’t know if you can understand this, but apart from the literals, the rest of the code just identifies structures and doesn’t have any real meaning, and the meaning given to those structures is how the interpreter executes the code. This feature requires us to have a structure in mind when we look at code, not just a string of strings.

Var a = 123 // all characters except the literal 123 are identifiersCopy the code

For example, in the simple JavaScript code above, var is an abstract symbol. It doesn’t matter whether it is var, val, or #%$. The only purpose is to identify the structure as a declarative assignment. It doesn’t matter what the a is, as long as the association it identifies doesn’t change, a can be substituted for any character. The only thing that actually makes sense is the 123, and I can’t change it to 456.

There was a question from Zhihu about how some big guys can learn a programming language in two weeks, and MY answer is that two weeks is enough to build a programming language, like JavaScript which Brandon Ek designed in a week. I’m definitely not as good as these guys, but I can build a JavaScript 1.0 in two weeks. Everyone can easily learn a programming language in two weeks by reading the recommended books and making up the basics.

Finally, being able to learn a programming language in two weeks does not mean being able to learn a domain in two weeks. Just as you can’t say that learning JavaScript equals learning the front end, and you can’t say that learning Python equals learning artificial intelligence (although a lot of training courses now teach the basics of Python in the name of ARTIFICIAL intelligence), programming languages are just programming languages, just tools.

Recommended reading:

  • Why Can Tail Recursion be Optimized? I wrote an article that uses simple character processing to optimize tail-recursive functions into loops

Recommended tools:

  • AST Explorer – What does your favorite JavaScript look like

Re is context-free grammar

This article is already the fourth section, but it is not until here that we can formally pick up our classic textbooks – dragon book, tiger book or whale book for study. This section takes a brief look at the Parser front-end compiler technology.

The first part of the compiler does just that: it parses a structured text file of code into a data structure that our computers can understand — an Abstract Syntax tree (AST). Parsing code is a boring, complex, and tedious process. This complexity is due to the complexity and complexity of programming language syntax design. Lisp, for example, is easy to parse because of its simple, consistent, and unambiguous syntactic design, but at the expense of Lisp’s much-mocked parentheses.

Parsing code is usually divided into two steps. The first step is lexical analysis, which converts the text code into tokens. As you can see here, you probably have some basic knowledge of regular expressions. In the process of parsing code, we need to use the re for word segmentation and lexical analysis. In terms of compilation principles, we will not only learn regular expressions, but also learn the kernel DFA of regular expressions, but this part is not too difficult.

The second step in parsing the code is parsing, which converts the tokens we lexical above into an AST. Syntax Analysis We will learn context-free grammar (CFG), which can be represented by BNF. CFG is stronger than regular expression ability, strong in CFG can express recursive structure, the common recursive structure has expressions and code blocks. In the parsing section, it is sufficient to know the basic LL(1) algorithm and be able to understand top-down analysis.

Whether regular or CFG, they are using a formal language (our programming language is also a formal language), to describe an abstract structure, so in the process of learning, the mind must be from the abstract structure of the concept, can get twice the result with half the effort.

Parser is difficult but not important in the compilation principle, so in this part, we think we can skip the complicated algorithm completely, and it is not recommended to waste too much time. Parser is automatically generated based on regs and CFGS and does not need to be hand-written. So the main purpose of this part is to learn regular and CFG, and those complicated arithmetics make little sense.

Last but not least, regular expressions are context-free grammars, whereas BNF is a regular grammar. why?

Recommended reading:

  • Misunderstanding Parser – my own real primer
  • Dragon book, Tiger book or whale book – classic compilation principle classic textbook
  • Of course I am talking nonsense – the blog of immensity god, please automatically screen immensity God subjective self hi when reading

Recommended tools:

  • Acorn, JavaScript Parser
  • REGEXPER – Displays regular expressions as graphs
  • Ohm – Visual BNF editor
  • Jison – Build a generic Parser using re and BNF

Programming languages began in earnest with AST

In fact, most people think of compiling principles as being stuck on the Parser stage, because most people are stuck on that stage while learning. However, Parser is actually the most superficial technology in the field. The AST is the beginning of a programming language, and it is only at the AST stage that our computers can analyze, interpret or translate our programming languages, and all the code we have laboured so hard to write is just for us stupid humans to see.

On the analysis of the programming language AST, transformation, interpretation and translation are supposed to be the one of the most important part of the compiling principle, but because we the classic compilation principle of an earlier time the book was published (1985), and focus only on the popular type is given priority to with the C compiler language, so it is focused on the parsing code and generate the assembly of two parts. However, from the perspective of current programming languages, with the Parser Generator on the front and LLVM on the back, we should put more emphasis on the middle end.

But so far, we’ve covered enough for most of you to write your own DSL, write your own little language compiled into JavaScript to play. But is it enough? What exactly can we do about AST? See you next time.

Reference Projects:

  • Bramblex/BLX-FSM – My own toy DSL for describing FSM
  • Bramblex /Smooth — Your own little toy language

Hello World!