Paper appreciation Rust | used for security system programming

Translator: MATRIXKOO zhang Handong

Original paper: cacm.acm.org/magazines/2…

Editor’s note:

This article was published in ACM Communications, April 2021.

The authors are Ralf Jung of the Max Planck Institute for Sofetware System and others. Ralf Jung is also the official Rust team working on improving Miri to provide UB checking in the Unsafe Rust. At the same time, the initiator of the Rust security model formal verification project Rust Belt.

The key point of view

  • Rust is one of the first industry-supported languages to bring security to both high-level and lower-level languagesSystem ProgrammingLanguage provides a long – term balance between resource control and language.
  • It addresses this challenge with a powerful type system based on ownership and principles that statically prevents changes in the shared state. This approach allows many common system programming vulnerabilities to be detected at compile time.
  • There are many data types whose implementation is fundamentally dependent on being in a shared mutable state, so Rust’s strict requirements for type-checking ownership cannot be relied on. To support such data types, Rust is wise to embraceunsafe, the use ofsafeThe API encapsulated within itunsafeThe code.
  • As aRustBelt projectPart of it, robustsemantic typeProve technology, as well as advancedseparation logicAnd machine proof techniques enable us to establish a rigorous and reliable foundation for Rust.

Translator’s Note: RustBelt Project: plv.mpi-sws.org/rustbelt/#p…

A project dedicated to ensuring Rust security with formal proofs

Programming language design has long been a contradiction between two seemingly irreconcilable desires.

Programming language design has long been a contradiction between two seemingly irreconcilable desires.

  • Safe. We want to force the type system to eliminate a lot of errors statically. We want automatic memory management. We want data encapsulation so that we can perform immutable object representations of private variables and ensure that they will not be corrupted by untrusted code.
  • Control. At least for Web browsers, operating systems, or game enginesSystem ProgrammingPrograms, limiting their performance or resources is an important issue, and we want to understand the byte-level representation of data. We want to useLow-level ProgrammingProgramming techniques optimize the use of time and space in our programs. We want to use it when we need itThe bare

However, according to the conventional wisdom, you can’t have your cake and eat it. Languages like Java give us great security, but at the expense of controlling the underlying layer. As a result, for many systems programming applications, the only realistic option is to use a language like C or C++ that provides fine-grained control over resource management. However, such control comes at a high cost. For example, Microsoft recently reported that 70 percent of the security vulnerabilities they fixed were due to memory security violations 33 and were problems that could be eliminated by strongly typed systems. Similarly, Mozilla reports that the vast majority of the key bugs they find in Firefox are memory-related 16. If you could somehow have it both ways: programming a secure system while having control of the underlying layer…

Type the Rust. Sponsored by Mozilla and actively developed by a diverse community of contributors over the past decade, Rust supports many common low-level programming idioms and apis derived from modern C ++. However, unlike C ++, Rust enforces the safe use of these apis through a powerful static typing system.

In particular, Rust, like Java, protects programmers from memory security problems (for example, use-after-free errors). But Rust goes a step further in defending programmers against other, more insidious anomalies that other mainstream languages cannot prevent. For example, consider data contention: asynchronous access to shared memory (at least one of which is a write operation). Even though data contention effectively constitutes the undefined (or weakly defined) behavior of concurrent code, most security languages (such as Java and Go) allow them to exist, and they are a reliable source of concurrent error35. In contrast, Rust’s type system detects data contention at compile time.

Rust has been steadily gaining popularity, so much so that many major industrial software vendors (Dropbox, Facebook, Amazon, and Cloudflare, for example) now use Rust internally, And Rust has been number one on Stack Overflow’s list of most popular programming languages for the past five years. Microsoft’s Security Response Center team recently announced that it is actively exploring the possibility of using Rust with Microsoft to reduce security vulnerabilities in system software. 8, 25

Rust’s design draws heavily from academic research on security system programming. In particular, the most distinctive feature of Rust design compared to other mainstream languages is its adoption of an ownership type system (often referred to in academic literature as an affine or substructural type system 36). Ownership type systems help programmers enforce safe patterns of lower-level programming by limiting the application of objects that can be used to mutate at any given point in program execution. But Rust goes beyond the ownership type system of previous work in at least two new and exciting ways:

  1. Rust has adoptedTo borrow (borrowing)andLifetimeThis makes it much easier to express common C ++ style idioms and ensures their safe use.
  2. Rust also provides a rich set of apis (for example, for concurrent abstraction, efficient data structures, and memory management) that fundamentally extend the language’s capabilities by supporting a more flexible combination of aliases and mutations than Rust’s core type system. Accordingly, these apis cannot be implemented within Rust’s security fragments: instead, they internally use the language’s underlyingunsafeC-style features encapsulate security in a way that claims not to interfere with Rust’s language-level security guarantees.

These aspects of Rust’s design are not only critical to its success, they also raise fundamental research questions about its semantics and security guarantees that the programming language community is only beginning to explore. In this article, we begin by providing readers with a bird’s eye view of the Rust programming language, highlighting some of the basic features that make Rust stand out from its contemporaries. Second, we describe the initial progress of RustBelt, an ongoing project funded by the European Research Council (ERC) that aims to provide the first formal (and machine-checked) basis for Rust’s safety claims. Through this project, we hope to inspire other members of the computer science research community to start paying more attention to Rust and contribute to the development of this ground-breaking language.

Motivation: invalid Pointers in C++

To demonstrate the kind of memory safety that is common in system programming languages, let’s consider this

Figure 1. Use-after-free bug in C++ and how the bug is prevented in Rust.

In the first line, the program creates a STD ::vector of integers. The initial contents of v (the two elements 10 and 11) are stored in a buffer in memory. In the second line, we create a pointer to the buffer, VPTR. Specifically, it points to the storage location of the second element (currently 11). Now, v and VPTR both point to the same buffer (overlap). We say that these two Pointers are aliased. In the third line, we push the new element to the end of v. In buffer V, add element 12 after 11. If there is no more space for other elements, a new buffer is allocated and all existing elements are moved to the top. Let’s suppose that’s what’s happening here. Why is this interesting? Because VPTR still points to the old buffer. In other words, adding a new element to v has turned VPTR into a dangling pointer. This is possible because both Pointers are references: an operation over a pointer (v) usually affects all of its references (VPTR) as well. [Figure 1] visualizes the whole situation.

VPTR is now a dangling pointer, with a problem in the fourth line. In this case, we dereference from VPTR, and since it is a dangling pointer, this is a use-after-free error.

In fact, the problem is so common that one instance of it has its own name: iterator invalidation, which is a situation where iterators (usually implemented internally with Pointers) are invalidated because the data structures iterated over are mutable during the iteration. The most common scenario is to iterate through a container data structure in a loop and then indirectly but accidentally invoke an operation that mutates the data structure. Note that, in fact, calls to operations that change data structures (push_back in line 3 of our example) can be deeply nested behind several layers of abstraction. Especially when refactoring code or adding new functionality, it is often almost impossible to determine whether pushing to a vector will invalidate Pointers elsewhere in the program. There is no use-after-free.

** Compares with garbage collection languages. ** Languages such as Java, Go, and OCaml avoid the use-free error of garbage collection: memory is freed only when the program cannot use it. Therefore, there can be no dangling pointer and no use-after-free. The problem with garbage collection is that, for efficiency’s sake, such languages generally don’t allow internal Pointers (that is, Pointers into data structures). For example, the array int [] in Java is represented like STD ::vector

in C++ (arrays in Java cannot grow). However, unlike C, a Java array can only be better for the corresponding undefined behavior of a program. This still leaves a lot to be desired: we want to prevent errors from happening in the first place (compile time), rather than deliver incorrect code and then detect problems at run time.

Rust’s solution to pointer invalidation. In Rust, the compiler statically detects issues such as invalid iterators and null pointer abuse, which result in compile-time errors rather than run-time exceptions. To explain how it works, refer to the Rust transformation of our C example at the bottom of [Figure 1]. As in the C version, there is a buffer in memory, and VPTR points to the middle of that buffer (resulting in references); Push might reallocation the buffer, which causes VPTR to become a dangling pointer and causes use-after-free in line 4.

But none of that happened. On the contrary, the compiler shows the error: variable V cannot be borrowed many times at a time. We will be back on borrowing soon, but the key idea (Rust’s mechanism for memory safety in the presence of Pointers to data structures by this mechanism) has become obvious here: The type system enforces this rule (which we’ll cover later) without multiple mutable references. In the context of concurrency, this principle should sound familiar, and Rust does use it to ensure that there is no data contention. However, as our example of being rejected by the Rust compiler shows, the unrestricted combination of references and variability is a recipe for disaster even for sequential programs: In line 3, VPTR and V references (v is considered to refer to everything in it and overlaps with VPTR), we are using a mutable reference, which will cause the memory access error in line 4.

Ownership and borrowing

Rust’s core mechanism for preventing uncontrolled references is ownership. Memory in Rust always has a unique owner, as shown in Example 2.

Figure 2. Rust example: Moving ownership.

Here, we construct v, similar to the first example, and pass it to Consume. Operationally, as in C ++, arguments are passed by value, but copies are shallow copy-pointers are copied, but their Pointers are not duplicated. This means that v and W point to the same buffer in memory.

This problem occurs if the program uses both v and w, but attempts to do so at line 6 result in a compile-time error. This is because Rust thinks that ownership of V has moved to Consume as part of the call, which means consume can do whatever it wants with W, and the caller may no longer have access to the Vec’s memory.

Resource management. Ownership of Rust not only protects against memory errors, but also forms the core of Rust’s approach to memory management (and, more broadly, resource management). When a variable that holds memory (for example, a variable of type Vec

in the buffer of a Vector’s memory) goes out of scope, we can be sure that memory is no longer needed, so the compiler can automatically free it at that point. To do this, as in C++, the compiler transparently inserts the destructor call. For example, in the consume function, it is not really necessary to explicitly call the destructor method (DROP). We can leave the body of the function empty, and it will automatically release w.

As a result, Rust programmers have little to worry about memory management: despite the lack of a garbage collector, it is largely automatic. In addition, the fact that memory management is also static (determined at compile time) has great benefits: not only does it help reduce maximum memory consumption, but it can also provide good worst-case latency in reactive systems such as Web servers. Most importantly, Rust’s approach goes beyond memory management: file descriptors, sockets, locks, handles, and other resources are all handled using the same mechanism, so Rust programmers don’t have to worry about closing files or releasing locks. C++ pioneered the use of destructors for automatic resource management in the form of RAII (resource fetch as initialization). 31 The main difference in Rust is that the type system statically ensures that a resource is no longer used after it is destroyed.

** Mutable references. ** Strict ownership rules are pleasant and simple, but not easy to use. Often, people want to provide data to a function temporarily and return it when the function allows it. For example, we want v.rush (12) to privilege push to change V, but we don’t want it to consume vector V.

In Rust, this is achieved by borrowing, which takes much of its inspiration from previous work on Region types. 13, 34

The add_ something function takes an argument of type &mut Vec

, which represents a mutable reference to Vec

. Operationally, this is just like a reference in C ++, where Vec is passed by reference. In the type system, this is interpreted as borrowing ownership of add_something from the Vec of the self-caller.

Figure 3. Rust example: Mutable references.

The add_something function shows what borrowing, which has been formatted, looks like. To understand why this code can pass when the compiler rejects the previous example of invalid Pointers, we must introduce another concept: lifeTimes. Just as in real life, misunderstandings can be prevented by borrowing something on the basis of a consensus on how long it can be borrowed in advance. Therefore, when a reference is created, it is assigned a lifetime and recorded as a full reference type: &’a mut T represents the lifetime ‘a. The compiler ensures that the reference (v, in our case) is used only during the life cycle, and that the reference object is not used again until the end of the life cycle.

In our example, the lifetime (all inferred by the compiler) lasts until add_something and Vec::push, respectively. V will not be used until the life of the previous borrowing is over.

In contrast, [Figure 4] shows the life cycle of the previous example inferred from [Figure 1]. VPTR borrows the ‘A’ of its life cycle starting at line 2 and continuing until line 4. VPTR cannot be shorter because it is used in line 4. This means that in line 3, V uses the borrowed, which is incorrect.

Figure 4. Use-after-free example with inferred reference lifetime.

To summarize: Whenever something is passed by value (as in consume), Rust interprets it as a transfer of ownership. When some variable references are passed (as in add_something), Rust interprets this as borrowing for a particular life cycle.

** Shared reference. ** Following the principle that mutable references are not shareable, mutable references are unique Pointers: they are not allowed. To complement this rule, Rust provides a second reference, called a shared reference, written as &vec

or &’a Vec

, which allows multiple references but cannot be changed. One of the main use cases for shared references is sharing read-only data between multiple threads, as shown in Figure 5.

Figure 5. Rust example: Shared references.

Here, we create a shared reference VPTR that points to (and borrows) V [1]. The vertical line here represents a closure with no arguments (sometimes called an anonymous function or lambda). These closures are passed to join, which is Rust’s version of parallel composition: It takes two closures, runs them in parallel, waits for them to complete, and returns two results. When the Join returns, the borrowing is over, so we can change v again.

Just like mutable references, shared references have a life cycle. Digging deeper into the code, Rust’s compiler uses a lifecycle to track when v is temporarily shared between two threads. At the end of the lifetime (line 5), v’s original owner regains full control. The main difference here is that multiple shared references are allowed to coexist in the same lifetime, as long as they are only used for reading and not writing. Will be one of the two threads in the sample changed to | | Valerie plame ush (12), you can see the rule of implementation: the compiler will give you an error, told we cannot at the same time with variable references and a reference to a sharing of Vec (Shared reference). Indeed, the program has fatal data contention between the reader thread and the v.rush (12) thread, so it is important that the compiler statically detect such cases.

Shared references are also useful in sequentially executed code; For example, after traversing a VEC with a shared iterator, we can still pass a shared reference to another veC function. In this article, however, we will focus on using shared references for concurrency.

* *. ** For security purposes, Rust systems enforce the following principles: Share immutable, mutable not shared. Having a value of type T means that the variable owns it completely. T can be referenced by either mutable reference (&mut) or shared reference (&t).

withsafeAPI relaxes Rust’s strict ownership rules

Rust’s core ownership principles are flexible enough to address many of the idioms of underlying programming. However, it can be too restrictive to implement certain data structures. For example, if the referential state cannot be changed, it is impossible to implement a bidirectionally linked list because each node is referenced by its next and previous node.

Rust takes an unusual approach to the problem. Than to allow their complex type system to solve not complying with the data type of data structure, or the introduction of dynamic check to enforce security checks at run time, Rust allows safe by developing * * * API * to relax its ownership guidelines – API reference through safely control the use of state variable to extend the language power of expression. While these apis are not implemented in accordance with Rust’s strict ownership principles (which we will discuss later), the API itself strictly leverits Rust’s ownership and borrowing mechanisms to ensure that they remain Rust’s consistent security as a whole. Now let’s look at some examples.

Shared mutable state Rust’s shared reference allows multiple threads to read shared data simultaneously. But threads that only read data are only half the story, and we’ll see how the Mutex API can safely share mutable state across threads. At first glance, this seems to contradict everything we’ve said so far about Rust’s security: isn’t the point of Rust’s ownership principle to prevent changes in the shared state? True, but we’ll see how Mutex can be used to sufficiently limit this change without compromising memory or thread safety. Now look at Figure 6.

Figure 6. Rust example: Shared mutable state.

We use structured concurrency and shared references again, but now wrap veC in Mutex: the variable mutex_v is of type Mutex< vec

>. The key operation for Mutex is lock. It blocks until an exclusive lock is acquired. When a variable goes out of scope, the lock is implicitly released by v’s destructor. Finally, the program prints [10,11,12] if the first thread manages to acquire the lock first, and [10,11] if the second thread does.

To see how the sample program type checks, let’s take a closer look at Lock. It (almost) (the actual type is LockResult<… Fn (&’a Mutex

)-> MutexGuard <‘a, T> can be called with a shared reference to the Mutex, This is why Rust allows us to invoke locking on two threads: both closures catch &mutex

> and, as with VPTR of type &i32, in our first concurrency example, both threads can use the reference at the same time. In fact, it is critical that lock get a shared reference rather than a mutable one; Otherwise, both threads will not be able to attempt to acquire the lock at the same time, and the lock will not be needed in the first place.

The return type of lock, MutexGuard <‘a, T>, is essentially the same as &’a mut T’ : it gives exclusive access to T stored in the lock. In addition, it automatically releases locks (known as RAII 33 in the C++ world) when out of range.

In our example, this means that both threads temporarily have exclusive access to the VEC, and they both have mutable references, which illustrates the fact that since locks correctly implement mutual exclusion, they never have mutable references at the same time, thus maintaining the uniqueness of mutable references. In other words, Mutex can safely provide changes in reference state because it implements runtime checks to ensure that mutable references are not created when changes are made.

As we have seen, shared references provide a way to share data between different parts of an application. However, a shared reference has a statically determined lifetime, and when that lifetime ends, the data is uniquely owned again. This works well with structured parallelism (join in the previous example), but not with unstructured parallelism, where a process runs independently of the parent process.

In Rust, the typical way to share data in this situation is to use atomic reference count Pointers: Arc

is a pointer to T, it counts how many Pointers to T exist, and releases T (and its associated resources) when the last pointer is destroyed (the reference count goes to zero). (This can be seen as a form of lightweight implementation garbage collection.) Since the data is shared, we cannot get &mut T from Arc

, but we can get &t. (In this case, the compiler ensures that Arc

is not broken during the lifetime of the reference.)


Figure 7. Rust example: Reference counting.

We start by creating an Arc that points to our normal VEC. Arc_v2 is obtained by clone arc_v1, which means that the reference count is increased by one, but the data itself is not increased. And then we start a thread that uses arc_v2; The thread continues to run in the background even after the function we wrote here returns. Because this is unstructured parallelism, we have to explicitly move arc_v2 (that is, transfer its ownership) to a closure running in another thread. Arc is a smart pointer (similar to shared_ptr in C ++), so we can use it like &vec

. In particular, in lines 3 and 4, we can print out the value of the element with index 1. When arc_v1 and arc_v2 go out of scope, their destructors are implicitly called, and finally Arc destroys veC.

Thread safety

Rc

is the last type we’ll talk about in this brief introduction to Rust. Rc

is a reference count type that is very similar to Arc

, but the key difference is that Arc

uses atoms (get and add instructions) to update the reference count, while Rc

uses non-atomic operations. As a result, Rc

may be faster, but not thread-safe. Rc

types are useful in complex sequentially executed code where the static scope of the enforcement of shared references is not flexible enough, or where the last reference to an object cannot be statically predicted and the object cannot be destroyed when it should be reclaimed.






Since Rc

is not thread-safe, we need to ensure that programmers do not mix Rc

when Arc

should be used. This is important: If we took our earlier Arc example and replaced all arcs with Rc, the program would have a data race and might allocate memory prematurely or not at all. It is important to note, however, that the Rust compiler is still able to catch such errors. Rust uses a trait called Send: this is a type property that type T can use only if elements of type T can be safely sent to another thread. The type Arc

is Send, but Rc

is not. Both Join and spawn require that everything captured by the closure they run on have Send, so if we catch variables of Rc

in the closure that are not Send, the compilation will fail.





Rust’s use of Send demonstrates that the constraints of strong static typing can sometimes be more expressive. This is especially true with C++ ‘s smart reference counting pointer STD ::shared_ptr, which uses atomic instructions. (To be more precise, on Linux, if a program uses PThreads, it uses atomic instructions for any code it uses that might produce threads.) , because having more efficient non-thread-safe variants (such as Rc) is considered too risky. By contrast, Rust’s Send allows people to code without fear: 26), Rust is a language that includes both thread-safe data structures (such as Arc) and non-thread-safe data structures (such as Rc) in a modular manner, while ensuring that neither is used incorrectly.

Security encapsulationunsafe

We have seen how types such as Arc and Mutex enable Rust programs to safely use things like reference counting and shared references. There’s a problem, though: these types can’t actually be implemented in Rust. Or rather, they cannot be implemented in * Safe Rust* : the compiler will refuse to execute Arc because of possible violations of reference rules. In fact, it will even refuse to use Vec to access memory that may not be initialized. Vec manually manages the underlying buffer and tracks its initialization for efficiency reasons. Of course, Arc’s implementation doesn’t actually violate the reference rules, and Vec doesn’t actually access uninitialized memory, but these inferences are too subtle for the compiler.

To address this, Rust provides a backdoor: Rust not only contains the Safe Rust we’ve discussed so far, but also provides some unsafe features, such as c-style unrestricted Pointers. The compiler cannot guarantee that these features are safe (memory-safe and/or thread-safe), so they are only available in syntax blocks marked with the unsafe keyword. This way, you can ensure that you don’t accidentally leave Safe Rust.

We hope to inspire other members of the computer science research community to start paying more attention to Rust and contribute to the development of this ground-breaking language.

For example, Arc’s implementation uses Unsafe to implement a pattern that cannot be expressed in Safe Rust: shared references with no clear owner that are managed by thread-safe reference counts. This is further complicated by support for weak references: weak references do not keep reference objects alive, but they can be checked for validity atomically and upgraded to full Arcs. The Rust compiler simply cannot statically verify that it is actually safe to free memory when the reference count reaches zero.

The alternative to the **unsafe** ** block ** converts things like Arc or Vec to the language level. For example, Python and Swift have built-in reference counting, and Python has the built-in list equivalent of Vec. However, these language features are implemented in C or C++, so they are actually no safer than the Unsafe Rust implementation. In addition, restricting unsafe operations to language-built implementations severely limits flexibility. For example, Firefox implements a variant of Arc using the Rust library, but does not support weak references, thus improving the space utilization and performance of the code. Should the language provide primitives for every possible location in the design space of any built-in type?

An alternative to unsafe is to make the type system expressive enough to actually verify the safety of types such as Arc. However, due to such a low probability of the security of the data structure (is actually the Arc and simplified variants, some of which have been used as a recent copy of formal verification in the paper the main case studies, [12], (# 12), 18, but it’s basically can only be experienced in the form of a general theorem proving and with sufficient background knowledge of the researchers to use. There is still a long way to go between developers and the proving community.

Security abstraction in contrast, Rust chooses to allow programmers the flexibility to write unsafe code when necessary, despite the expectation that it should be encapsulated by Safe’s API. Security encapsulation means that regardless of whether Rust apis such as Arc or Vec are implemented through unsafe code, users of those apis will not be affected: As long as users write correctly typed code in the SAFE fragment, they will never observe abnormal behavior because Rust uses unsafe code in the IMPLEMENTATION of the API. This is in stark contrast to C++, where C’s weak typing system lacks or even fails to enforce apis that are prone to abuse, leading to reference-counting errors and invalid iterators, none of which occur in Rust.

The ability to write unsafe code, like the lever Rust programmers use to make the type system more useful without turning it into a theorem prover, is, we believe, key to Rust’s success. The Rust community is developing an ecosystem of high performance libraries that are securely available, enabling programmers to build secure and efficient applications on top of them.

But, of course, there is no such thing as a free lunch: the authors of the Rust library must somehow ensure that they are careful not to break Rust’s security guarantees when using unsafe. On the one hand, this is much better than C/C++, because the vast majority of Rust code is written in the language’s Safe, so Rust has a much smaller attack surface. On the other hand, when unsafe became indispensable, it was far from obvious that programmers knew they were being careful.

Therefore, in order to maintain confidence in the safety of the Rust ecosystem, we really wanted a formalized way to formally verify the behavior behind the unsafe use of security apis. That is the goal of the RustBelt project.

RustBelt: Defender of Rust infrastructure

The primary challenge in verifying the security of Rust is to consider the interaction between Safe and unsafe. To see why this is challenging, let’s briefly take a look at the standard technique for verifying the security of programming languages, known as the Syntactic Approach. 14,37 with this technique, security is represented by syntactic typing judgment, a type checker that gives a formal structure based on extensive mathematical derivation.

Theorem 1*(* Syntactic type soundness) e is safe if the program e is well-typed after Syntactic typing judgment*.

Well-typed see Synonyms at Type safety

Unfortunately, this theorem is too weak for our purposes, because it only talks syntactically about secure programs, thereby ruling out programs that use unsafe code. For example, if true {e} else {crash ()} is not syntactically well-typed, but since crash () is never executed, it is still safe.

Solution:

Semantic type soundness

To illustrate the interaction between Safe and unsafe, we switched to a technique called Semantic Type Soundness, which expresses security based on the behavior of the program rather than using a fixed set of inference rules. The key element of Semantic Type Soundness is logical relation, which assigns safety contracts to each API. This means that if the input of each method in the API matches its specified type, so does the output. Using techniques in formal validation, it is possible to prove that the implementation of the API satisfies the specified safety Contract, as shown in Figure 8.

Figure 8. Semantic type soundness.

For reasoning with programs that combine safe and unsafe, Semantic Type Soundness is ideal. For any library that uses unsafe code blocks, such as Arc Mutex Rc and Vec, you must manually certify that the implementation satisfies the Safety Contract. Such as:

Theorem 2. Arc satisfies its safety contract.

For the safe code block of the program, validation is automatic. This is expressed by the following theorem, which states that if a component is written to Rust’s safe code block, it satisfies its Safety Contract by constructing it.

Theorem 3 (Fundamental theorem). If component E is syntactically.

Unsafe, if “unsafe” only appears in libraries that have been manually validated to satisfy safety Contracts, Rust is safe.

Using Semantic Type Soundness is an old technique, dating back at least to Milner’s (1978) seminal paper on type soundness, 28, but extending it to industrial modern languages like Rust proved to be a daunting challenge. In fact, extending the “”step-indexed Kripke logical relations” (SKLR) model 3,5 to languages with mutable state and higher-order functions was still an open question until it was developed. 2,4, as part of the basic proof code project. Even so, validation of safety contracts encoded directly using SKLR models is still cumbersome, low-level and difficult to maintain.

In the RustBelt project, we build on Iris’s latest work 19,20,21,23, (a proof framework for higher order, concurrent, imperative programs, implemented using Coq proof assistant). 1 Iris provides a more high-level language to encode and use SKLR models, which enables us to extend such models to handle complex languages such as Rust. In particular, extensions based on separation Logic, 29, 30, and Hoare logic 15 specialize in modular reasoning for pointer manipulators and center on the concept of ownership. This provides us with an ideal language for modeling the semantics of ownership types in Rust.

Iris extends the traditional separation Logic and adds some additional features that are critical to Rust:

  • Iris supports user-defined Ghost State: the ability to define custom logical resources that are useful for proving the correctness of a program but do not directly correspond to anything in its physical state. Iris user defined Ghost State enables us to verify the health of libraries such as Arc, whose ownership does not correspond to physical ownership (e.g., Two independently owned Arcs

    can be in the same type of memory) — a phenomenon called fictional separation. 10, 11

    By deriving a new domain-specific Lifetime Logic (within Iris), it also enables us to think of borrowing and life cycle of Rust at a higher level of abstraction.

  • Iris supports impredicative invariants: invariants on a program state that may loop to refer to the existence of other invariants. 32 Impredicative Invariants play a crucial role in building systems of core types, such as recursive types and closures.

The complexity of Rust requires that semantic robustness be machine-checked, which is too cumbersome and error-prone to be proven manually. Fortunately, Iris comes with a wealth of separation-logic tactics based on standard Coq policies, Thus, Coq users can be familiar with developing machine-checked semantic integrity proofs in a time-tested manner. 22, 24

Conclusions and Prospects

In this article, we give a bird’s eye view of Rust, showing core concepts such as borrowing, survival, and unsafe code encapsulated in the Safe API. These features help Rust become the first industry-supported language to strike a long-term balance between the security provided by high-level languages and the resource management provided by lower-level system programming languages.

To investigate Rust’s security claims, we describe a technique for proving semantic type soundness, which allows us to begin building a rigorous foundation for Rust in the RustBelt project. For more detailed information on Rust and RustBelt, please refer to our POPL’18 paper 18 and first author’s doctoral dissertation. 17

We still have a lot of work to do. Although RustBelt has recently considered the rose-memory concurrency model 9 for Rust inherited from C++, it has yet to cover many other Rust features and apis, such as its trait system, The system is so complex that there are many subtle bugs. 7. While verifying the robustness of the Unsafe Rust library currently requires a deep background in formal semantics, we hope to eventually develop formal methods that can be handed directly to programmers.

Finally, while RustBelt is focused on building a security foundation for Rust itself, we are pleased to see that other research projects (notably Prusti 6 and RustHorn 27) are beginning to explore an exciting orthogonal direction: Rust’s strongly typed system has the potential to be a powerful tool for simplifying formal validation of system code.

References

1.The Coq proof assistant, 2019; coq.inria.fr/.

\2. Ahmed, A., Appel, A.W., Richards, C.D., Swadi, K.N., Tan, G. and Wang, D.C. Semantic foundations for typed assembly languages. TOPLAS 32, 3 (2010).

\3. Ahmed, A.J. Semantics of types for mutable state. Ph.D. thesis, Princeton University, 2004.

\4. Appel, A.W. Foundational proof-carrying code. LICS, 2001.

\5. Appel, A.W. and McAllester, D. An indexed model of recursive types for foundational proof-carrying code. TOPLAS 23, 5 (2001).

\6. Astrauskas, V., Müller, P., Poli, F. and Summers, A.J. Leveraging Rust types for modular specification and verification. PACMPL 3 (OOPSLA), 2019.

\7. Ben-Yehuda, A. Coherence can be bypassed by an indirect impl for a trait object, 2019; Github.com/rust-lang/r… .

\8. Burch, A. Using Rust in Windows. Blog post, 2019; Msrc-blog.microsoft.com/2019/11/07/… .

\9. Dang, H.-H., Jourdan, J.-H., Kaiser, J.-O. and Dreyer, D. RustBelt meets relaxed memory. PACMPL 4 (POPL), 2020.

\10. Dinsdale-Young, T., Dodds, M., Gardner, P., Parkinson, M.J. and Vafeiadis, V. Concurrent abstract predicates. ECOOP, 2010.

\11. Dinsdale-Young, T., Gardner, P. and Wheelhouse, M.J. Abstraction and refinement for local reasoning. VSTTE, 2010.

\12. Doko, M. and Vafeiadis, V. Tackling real-life relaxed concurrency with FSL++. ESOP 10201, LNCS, 2017.

\13. Grossman, D., Morrisett, G., Jim, T., Hicks, M., Wang, Y. and Cheney, J. Region-based memory management in Cyclone. PLDI, 2002.

\14. Harper, R. Practical Foundations for Programming Languages (2nd Ed.). Cambridge University Press, 2016.

\15. Hoare, C.A.R. An axiomatic basis for computer programming. Commun. ACM 12, 10 (1969).

\16. Hosfelt, D. Implications of rewriting a browser component in Rust. Blog post, 2019; Hacks.mozilla.org/2019/02/rew… .

\17. Jung, R. Understanding and Evolving the Rust Programming Language. Ph.D. Thesis, Universitat des Saarlandes, 2020; People. Mpi – SWS, org / ~ jung/thesi… .

\18. Jung, R., Jourdan, J.-H., Krebbers, R. and Dreyer, D. RustBelt: Securing the foundations of the Rust programming language. PACMPL 2 (POPL), 2018.

\19. Jung, R., Krebbers, R., Birkedal, L. and Dreyer, D. Higher-order ghost state. ICFP, 2016.

\20. Jung, R., Krebbers, R., Jourdan, J.-H., Bizjak, A., Birkedal, L. and Dreyer, D. Iris from the ground up: A modular foundation for higher- order concurrent separation logic. JFP 28 (2018).

\21. Jung, R., Swasey, D., Sieczkowski, F., Svendsen, K., Turon, A., Birkedal, L. and Dreyer, D. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning. POPL, 2015.

\22. Krebbers, R., Jourdan, J.-H., Jung, R., Tassarotti, J., Kaiser, J.-O, Timany, A., Chargueraud, A., and Dreyer, D. MoSeL: A general, extensible modal framework for interactive proofs in separation logic. PACMPL 2 (ICFP), 2018.

\23. Krebbers, R., Jung, R., Bizjak, A., Jourdan, J., Dreyer, D. and Birkedal, L. The essence of higher-order concurrent separation logic. ESOP, 2017.

\24. Krebbers, R., Timany, A. and Birkedal, L. Interactive proofs in higher-order concurrent separation logic. POPL, 2017.

\25. Levick, R. Why Rust for safe systems programming. Blog post, 2019; Msrc-blog.microsoft.com/2019/07/22/… .

\26. Matsakis, N. and Turon, A. Rust in 2016, 2015. Blog post; Blog.rust-lang.org/2015/08/14/… .

\27. Matsushita, Y., Tsukada, T. and Kobayashi, N. RustHorn: CHC-based verification for Rust programs. ESOP, 2020.

\28. Milner, R. A theory of type polymorphism in programming. J. Computer and System Sciences 17, 3 (1978).

\29. O’Hearn, P.W., Reynolds, J.C. and Yang, H. Local reasoning about programs that alter data structures. CSL, 2001.

\30. O ‘hearn, P.W. Resources, concurrency and local reasoning. Theoretical Computer Science 375, 1 — 3 (2007).

\31. Stroustrup, B. The C++ Programming Language. Addison-Wesley, 2013.

\32. Svendsen, K. and Birkedal, L. Impredicative concurrent abstract predicates. ESOP, 2014.

\33. Thomas, G. A proactive approach to more secure code. Blog post, 2019; Msrc-blog.microsoft.com/2019/07/16/… .

\34. Tofte, M. and Talpin, J. Region-based memory management. Information and Computation 132, 2 (1997).

\35. Tu, T., Liu, X., Song, L. and Zhang, Y. Understanding real-world concurrency bugs in Go. ASPLOS, 2019.

\36. Walker, D. Substructural type systems. Advanced Topics in Types and Programming Languages. B.C. Pierce, Ed. MIT Press, Cambridge, MA, 2005.

\37. Wright, A.K. and Felleisen, M. A syntactic approach to type soundness. Information and Computation 115, 1 (1994).