The original address: blog.shipreq.com/post/compil…

Original author: github.com/japgolly

Release date: January 18, 2021

I’ve often heard that writing compilers is hard, unlike writing other kinds of software. A few recent experiences have given me insight into why this is the case, and it has proved quite interesting! I recently finished work on the new big feature in ShipReq.

I recently finished work on a big new feature in ShipReq. I’ve been working on it for about 2 months and it turned out to be the hardest code I’ve ever written in my life. I’ve been coding for decades, working on many different projects and for some very large organizations; I’m telling you this for context. When I say this is the hardest code I’ve ever written, it has a lot of competition. Concurrent systems generally have a reputation for being very difficult, and I had designed and written a large-scale concurrent project from the ground up back in the OO era, including writing all the scheduling and concurrency primiprimis, and managing all the thread-safety myself (that was a different time). I found this piece of work an order of magnitude more difficult.

I said in the ShipReq about page that ShipReq is like a compiler. The big feature I’ve recently completed is the most compilers like feature I’ve ever done, and I’m surprised that it’s this property specifically that makes it so deceptively difficult. What is this function? Basically, when you have a bunch of requirements/issues/design documents/whatever, this feature adds the ability to automate the padding based on the relationships and related fields of certain fields, so that users don’t have to manually edit or maintain it themselves, which is cumbersome and error-prone. For example, if someone presents a bug that needs some development task to fix, people can set up their own project, and when all these development markers are marked “done,” the bug itself automatically becomes “ready to test.” This feature has no knowledge of “done” or “tested,” but allows users to define any states and rules they want. This may sound irrelevant to writing a compiler. (That probably doesn’t sound too hard.) Requirements, bug reports, and project documentation also seem to have little to do with the code. But there are striking similarities. Let’s see…

Similarity 1: Relationship

Almost all data relationships in ShipReq are many-to-many /DAG. Some views in ShipReq display data as a flat list, but in essence, rarely anything is stored as a flat list, and there is almost always a way to view the same data as a graph or tree.

If you look closely, the source code is the same. Packages have more than one class/object, which has more than one method/field, which has more than one statement within the class, which has more than one clause, etc. In terms of declarations, it’s more one-to-many than many-to-many, but it’s still all one big DAG. When we start thinking about relationships like imports and methods calling other methods, then it also becomes many-to-many, and we get a circular graph.

Similarity #2: Interaction

Second, this new feature in ShipReq is very user configurable. This is a major goal of ShipReq, to avoid hard-coded implicit assumptions and find appropriate, configurable abstractions. Obviously having users have to reconfigure everything all the time would be a terrible burden, so we provide a full set of pre-configured Settings with best practices that users can (optionally) use and can reconfigure unrestricted to best meet their needs. From a coding perspective, this means that we never know the automation rules at compile time. The code needs to understand and apply all the rules at run time, and it needs to be able to identify and process combinations of data and logic that invalidate invariants. This may not sound like a big deal, but in a complex system, there are many invariants, and they don’t stay at their immediate domain boundaries — they propagate through all the relevant graphs and are composed with other invariants in varying degrees of complexity.

For example, you might have 10 simple invariants in 10 isolated domains, but when all of them are combined, the number of states to consider becomes 10,000,000,000, and a significant percentage of those states are considered illegal (that is, bugs).

It’s like writing a compiler. For example, the Java compiler doesn’t hard-code your class’s privacy level, it allows you to set it yourself. For example, ShipReq might have a UI page for configuration, whereas in Java, you have modifier keywords like public class Hello and private class Hello are different. From a compiler’s point of view, these are run-time values with run-time rules. When we consider additional keywords such as final and abstract, we begin to get into what I mean by combinatorial explosion. At runtime, the Java compiler must ensure that all possible combinations are valid. For example, it will reject the following code.

final public class Hello {
  public abstract void doSomething(a); // abstract method in final class not allowed
}
Copy the code

and

abstract class Hello {
  private abstract void doSomething(a); // private + abstract not allowed
}
Copy the code

Imagine that you are writing your own Java compiler. It’s up to you to think about the illegal combinations and write the code to catch them. I think we all take things for granted, like marking a method private and knowing it can’t be called from outside, but somehow it’s not impossible. You’re really in the right place to rely on someone else’s if statement. Like my ShipReq example, it’s all value checking at run time, with no guarantee of how all the features interact. The interaction of class/method privacy features with final and Abstract features is already psychologically simple, but each feature has the ability to interact badly with something else, and it’s not always easy to identify these combinations.

I recently saw a good example of PR to Scala. The private and sealed keywords have been around for a long time, and it’s only recently that someone realized that private classes are effectively sealed. Scala has been around for 16 years! When you have hundreds of features, it doesn’t seem feasible to manually consider every possible combination. 100 features produce 4,950 unique pairs. But there is nothing special about pairs, and some problems may only arise when three specific features intersect. For 100 features there is.

  • 4950 unique 2 groups
  • 161,700 sets of unique 3
  • 3,921,225 unique 4-piece sets
  • 17,310,309,456,440 sets of unique 10 sets of equipment

Out of 100 possible features, it’s not impossible for 10 of them to interact poorly with each other. You just can’t think about it manually.

Similarity #3. Type unsafe

Type safety is a pillar I’ve always relied on. No, when you think of type safety, please don’t think of the Java type system, that’s not really what I mean, although it’s a start. I’m talking more about the type systems in languages like Scala and Haskell, thinking about features like existential and dependent. Type systems like Java and Typescript are great! They add value. They add value! However, you can achieve an order of magnitude or two of security with more advanced type systems, and it’s very beneficial when you learn how to make good use of them.

Unfortunately, type safety doesn’t help me much when it comes to developing my big features. Suppose you have a data location that can accept a set of values. At compile time, each runtime value is acceptable because there is always a configuration that makes every possible value legal. This is like having a ‘Name’ field of type String, and then a runtime rule specifying the necessary first letter of the Name. (Unrealistic, but that’s the concept I emphasize.) FirstNameLetter: Option[Char] In such cases, you must make the name field something that includes all possibilities, such as name: String. You can slyly create a type that encodes the proof of the name to conform to the rules, but this is just a derivative of (name: String, firstNameLetter: Option[Char]). Unless the name and configuration can be modified atomically to a single compound value, you still need to store the two values separately before such a derivation. The first letter of the name restriction is a toy example, but it points to the fact that your runtime configuration is in one place, runtime data is in another, and there is no type safety at compile time to enforce the right things to happen at run time.

Compilers have the same problem. They maintain a large runtime database, and it is entirely up to compiler authors to apply all the rules, move data in and out of scope based on other runtime conditions, and so on and so forth. Unlike us compiler users, they don’t have the benefit of being able to rely on (many) type systems. They are on their own and can only use type safety to protect the underbelly of their very complex domain.

Similarity #4: Feedback

This is a necessary process, things will go wrong, and users need to get meaningful feedback.

In the case of ShipReq, it would…

  • Detect changes that are always wrong and prevent users from making those changes. These are mistakes.
  • Detect data/configuration that has become incorrect, modify the derivation accordingly, and present it to the user in a fully understandable manner. These are warnings.
  • Gather all the necessary information to explain to users how, why, and where derived data comes from. It’s one thing to automatically populate/modify a user’s data, but I believe users should be able to examine and understand things like what steps were performed and what data was a factor. That’s where it comes from. It doesn’t sound difficult, but in practice, it’s surprisingly challenging.

Compilers do the same thing. They…

  • Invalid data and data combinations are detected at run time and interpreted to the user. These are mistakes.
  • They detect acceptable or potentially faulty data, process it, perhaps by ignoring statements, or modifying the generated code in some way, and finally, they flag this to the user. These are warnings.
  • Some compilers figure out how something happened and present it, usually in a compilation error. Some compilers run 10-30 lines of error messages because they provide so much information about context and computation up to this point. That’s a form of provenance.

Similar to the above point, it took a dedicated person to write four trillion if conditions (not literally), collect all kinds of runtime data at very strategic locations, and do a lot of conditional data transformations to be able to explain a much larger underlying data set concisely.

Similarity #5: Performance

Slow compilation is disappointing to the public. Every two years or so, Scala compiles significantly faster and will continue to do so, but many people still have a bad reputation for being slow. I think Rust is starting to build that reputation right now. Go supporters boast that the super-fast compilation speed is one of the language’s main selling points. Speed is very important when you’re writing a compiler.

The feature I wrote has the same requirements. It may require a full recalculation on every user change! Everything that happens with each change needs to be fast.

I’ve been a big fan of FP(functional programming) for about 9 years, but for this feature, I’ve had to use a lot of variables and variability. The beauty of FP is that you can encapsulate mutable code in an immutable, reference-transparent way, so that the “atoms” that work are FP themselves, even though all the variability is under the shell. That’s exactly what I did, and I was able to eat my cake: the code was super fast, and the entire “compile” step was atomic and pure, which meant I didn’t sacrifice FP. In other words, the same immutable data comes in; It’s the same immutable result, but I just happen to use a lot of mutability to get this immutable result. This, by the way, is a wonderful property of Scala and its multi-paradigm nature. I’m sure the evangelists of Haskell (technically an FP-only language) would argue that you can have the same speed in Haskell, but I’m not sure whether that’s true or by what means. Wouldn’t the code just do the exact same thing, but use IORefs or giant StateT stacks to do more Boilerplates? Will Haskell’s optimization be so efficient that the more direct FP approach will be fast enough? Actually, I don’t know, but I want to know. I suspect that looking at the internals of the Haskell compiler will be a treasure trove of profound and awesome insights. But I digress.

With the possible exception of Haskell, writing a fast compiler means having a lot of mutable states; The difficulty of reasoning and keeping it right is multiplied. I was in tears, this was something I had to achieve in order to have lightening performance (and this is after a lot of very strategic caching), and almost every compiler I know has this feature.

Solution (1/3)

We’ve talked a lot about the difficulties, but what are the solutions? How do you make sure this complex code really works? This is my method.

The first thing to do is to make your tests, especially your test data representation, as easy to read and write as possible.

For example, say your expected value is like this.

final case class Result(accounts: Set[Account])

final case class Account(id     : Long,
                         name   : String,
                         roles  : Set[Role],
                         aliases: Set[String])

sealed trait Role
object Role {
  case object Admin   extends Role
  case object User    extends Role
  case object Offline extends Role
}
Copy the code

You might be tempted to start writing tests like this.

valResult = doStuff (...).val accounts = result.accounts.toVector.sortBy(_.id)
assertEqual(accounts.length, 3)

assertEqual(accounts(0),
  Account(2."David".Set(Role.Admin.Role.User),
    Set("Baz"."Bazza"."Dave"."Davo")))

assertEqual(accounts(1),
  Account(7."Bob".Set(Role.User), Set.empty))

assertEqual(accounts(2),
  Account(13."John".Set(Role.User.Role.Offline),
    Set("Jono"."Johnny")))
Copy the code

Don’t do it!

  • It was painful to write
  • You only get a subset of failures at a time. Maybe it’s “4 not 3”, maybe it’s an account failure when you have multiple
  • Without davo, it’s hard to tell from the resultsAccountWhat went wrong is especially true for big data.
  • If you change the data structure, you have to change every test. Imagine that when you write your 51st test case, you need to putSet[Role]To something calledRolesA new class. That’s 50 tests, who knows how many times that happens, and you need to manually update it.

Instead, dote on yourself and your team. Take an hour or two to be nice to yourself. You’ll be writing a lot of tests, so make it as easy as possible.

  • Writing new tests
  • Modify existing tests
  • Understanding test failure
  • Read and understand the test cases

In the example above, we’ll write a function that converts Result to String by doing the following.

  • Sort by id
  • Will aSet[Role]Represented as aAdmin.UserandOfflinetheauoMark; Just like in Linux you haverwxSame permissions.
  • When aliases appear, they are sorted and printed like YAML

Also, if your test library doesn’t handle multi-line strings well, find one that does. Ideally, when two multi-line strings don’t match, you want to see colored side-by-side comparisons.

Our new test will look like this.

def assertResult(actual: Result, expect: String) :Unit =
  assertEqual(formatForTests(actual), expect.trim)

valResult = doStuff (...).def expect =
 """#2) David [au-] | Aliases: | - Baz | - Bazza | - Dave | - Davo | |#7) Bob [-u-] | |#13) John [-uo] | Aliases: | - Jono | - Johnny """.stripMargin

assertResult(result, expect)
Copy the code

Much better!

  • Write faster
  • Faster modification
  • The breakdown is easy to understand (especially with color juxtapositions).
  • We can change the data structure without changing all the tests.
  • When the test fails, we see everything at once, succinct.

Now go unit test! Write all the tests you would normally do. Make the same effort to make generating test input as easy as possible if you need it. Write a test for every questionable data combination you can think of. Be sure to add comments that describe why the combination is problematic, because it’s not always obvious.

Solution (2/3)

Once you’ve passed all your unit tests, it’s time to move on to the next step: property testing.

If you don’t know what a “property test” is, please search for it, because it is an invaluable tool. It’s life-changing.

Think of all the attributes (that is, invariants) that the resulting data should satisfy. I find it much easier to think about properties for compilers like code than it is to normally run code, probably because data and data relationships are so important in this area.

Even with our toy account example above, you can come up with attributes like this.

  • Ids should be unique
  • Have a unique name
  • The alias must never be the same as another user’s name (perhaps the code should filter out those names).

If you’re working with a graph rather than a list, usually each node should follow some properties related to its child or parent node.

Make sure your property test library can reproduce its tests. Each time it detects a failure, it extracts the data and adds it as a unit test. My OSS Nyaya library has a feature called bug-Hunt that allows me to tell it “Hey, run n tests on m threads starting from seed X, one test per seed”. When I looked at the ShipReq feature above, it found a lot of problems, usually within 30 seconds, until it finally ran for an hour or two without finding any problems. I can’t guarantee that I won’t still have some crazy bugs in my code, but through all my unit tests and over a few hours of system testing, I can at least rest assured that it’s mathematically highly unlikely. This is the best you can get for a reasonable effort.

Solution (3/3)

If you want to, you can also invest unreasonable energy and go further! You can try to write formal, machine-verifiable proofs. You can try to write a formal, machine-verifiable proof. Once you have a verifiable proof, you can guarantee that your logic will work in all cases without any unexpected errors (assuming you implement the logic of the proof correctly). While this is an outcome we all like, it is rarely feasible because it can take years of work to complete.

An interesting example is Scala. It has been around for many years, and over the years the community has encountered many bugs. These days, compiler bugs are rarely encountered because there are ongoing efforts to fix them. But the most exciting thing about the upcoming Scala 3 is that, after nearly a decade of work by several PhD students, the foundation of the Scala language has been formally proven. It’s not a complete language yet. For example, the interaction between high types and subtypes is still an open research question. This is not specific to Scala, by the way, and the problem with any language that tries to include these two features is that we never really know if there are any remaining bugs, we just know that everything added to the language test suite will work until proven otherwise. I like that Scala has a so-called “community build” that compiles and tests hundreds (or more) of the most popular OSS Scala libraries and frameworks. This is a great way to capture regression. But I’m really digressing now. The point is: formal proof = is very difficult.

I should also mention that you can use TLA+ to get the proof, but with much less effort. You’re basically building a very abstract state machine that will force proof by testing all possible combinations. In very special circumstances, this is an artifact! TLA+ can find borderline cases no matter how rare; It’s usually found in a matter of seconds. I’ve used it with ShipReq to ensure things like ultimate consistency and ensure that users never see stale data. You can spend a couple of days making sure your logic works in all possible situations. But for verifying compilers like code, which I haven’t tried yet, you probably have to sit down and think very, very, very hard about how to model your logic in a compatible and efficient way. The state space is likely to be too large. Depending on what you’re doing, it might work, but I’d at least suggest you think about it.

What else am I missing? Sorry if the solution I’m introducing is a little unusual. If you can think of another solution, I’d really like to know, so please let me and other readers know in the comments on Reddit or HackerNews!

conclusion

I wanted to write this for a couple of reasons. I’ve heard compiler writers say that writing a compiler is very different from writing anything else, but I don’t quite understand why. Unfortunately, I’ve also seen many examples of people being rude and mocking certain compiler writers and languages because of trade-offs or bugs. Some moderate people have said things like, “Isn’t it easy for FP to solve problem X?” “Why is there so much variability? This is clearly a recipe for disaster, “and so on. While they seem like reasonable positions, I often find myself thinking that these compiler writers must have been smart enough to make such complex systems to know about these strategies and their benefits. So what happened?

I think I have the answer now, and I hope this article and my thinking will give you some insight. The work I did was trivial compared to the complexity of a full language compiler, yet it was the hardest code I had ever written. (I’ve been coding my entire life for over 30 years!) .

I hope you find these insights as interesting as I do. I think I’m getting closer to the idea that writing a compiler is very different from writing anything else. But I actually want to go a step further and say that writing a high-configuration, high-flexibility system is very difficult.

Thank you!

We all stand on the shoulders of giants. Remember to stop and appreciate the work and effort that goes into the tools we rely on every day. Think about all the if statements your language compiler must have to reflect what we all think is so obvious. All the complexity we rely on is built on layers of primordium. So, if you can, maybe offer your everyday language writers a cup of coffee, or just show your appreciation with a nice thank-you message.

I’m so grateful to all the people whose work I’ve built on. Part of what drives me personally is that I want to be able to generate a new generation of efficiency and quality in ShipReq that my users can take for granted and say to themselves, “Yeah, technology has evolved again, this 2021 technology is my baseline right now,” and then continue to build what they think is revolutionary. It’s a beautiful cycle.


www.deepl.com translation