Paper appreciation | SafeDrop: through static data flow analysis to detect Rust release program memory error

Editor: Zhang Handong

Original paper: arxiv.org/abs/2103.15…

Editor’s note:

The paper was co-authored by four fudan university students, including two co-first authors cui Mohan and Chen Chengjun, who also participated in the 3rd RustFriday Online salon on April 9. Play back the address: www.bilibili.com/video/BV1nU…

Chinese community of Rust fly book group Invite you to join: applink. Feishu. Cn/TeLAcbDR

The SafeDrop tool mentioned in the paper is not open source yet and will be when it is further improved.

This article is not a full translation, just excerpts of key parts. Welcome to the salon video playback.

introduce

Rust is an emerging programming language designed to prevent memory-safe errors. However, Rust’s current design also has side effects that may increase the risk of memory security issues. In particular, it employs OBRM (proprietary resource management) and performs automatic reallocation of unused resources without a garbage collector. As a result, it can incorrectly handle reclaimed memory and cause a free after use or double free problem. In this paper, we investigate the problem of invalid memory allocation and propose SafeDrop, a static path-sensitive data flow analysis method to detect such errors.

Our method iteratively analyzes each OF Rust Crate’s apis by iterating through the control flow diagram and extracting all aliases for each data flow. To ensure accuracy and scalability, we leverage the improved Tarjan algorithm for scalable path-sensitive analysis, as well as cache-based strategies for efficient interprocess analysis. Our experimental results show that our method can successfully detect all existing Cves with such problems, with a limited number of false positives. The analysis overhead ranges from 1.0% to 110.7% compared to the original compilation time. We further applied the tool to several real Rust Grates and found eight Rust Grates involving invalid memory release issues.

The OBRM (Owner-based Resource Management) model assumes that resources should be allocated when owners are created and released when their owners are out of valid scope. Ideally, the model should be able to prevent pointer dangling and memory leaks even if the program encounters exceptions.

However, we observed that in reality many of Rust Crate’s key errors are related to this auto-release scheme, for example, it may incorrectly discard some still-in use buffers and incur use-after-free errors (for example, CVE-2019-16140), Or it may incorrectly discard the Pointers and result in a double free (for example, CVE-2019-16144).

Typically, a memory release error is triggered by the Unsafe Rust code. In Rust, the Unsafe Rust API is required to provide low-level control and abstraction over implementation details. However, misuse of the Unsafe Rust API may compromise the soundness of proprietary resource management systems and may result in uncertain behavior. For example, the Unsafe Rust API might cause memory reclamation for shared aliases, while removing one instance would cause Pointers to the remaining aliases to be dangling. Additionally, internal Unsafe Rust allows functions that only have Unsafe code internally to be called safe functions, which may have memory safety issues. The current Rust compiler is ignorant of the memory-safety risks of Unsafe Rust code, and simply assumes that developers are responsible for using it. Memory security is the most important feature Rust advocates, and it is important to mitigate this risk if possible.

To solve this problem, SafeDrop, a static path sensitive data flow analysis method, is proposed to automatically detect the memory safety error release mechanism. Safe-drop repeatedly analyzes each API in Rust Crate and carefully checks for each Drop statement in Rust MIR (intermediate representation) to see if it is Safe to start. Because the potential problem with alias analysis is determinable, it is difficult to solve, so we have employed several designs to improve the scalability of our approach without sacrificing too much precision. Our approach uses the path encounter (MOP) method and extracts all valuable paths of each function based on the improved Tarjan algorithm, which is effective for eliminating redundant paths in loops with the same alias relationship. For each path, we extract the collection of all aliases in a traffic-sensitive manner and analyze the security of each DropStatement accordingly. When we encounter a function call, we recursively execute SafeDrop on the caller and analyze the alias relationship between the parameters and the return value. To avoid duplicate analysis, we cache and reuse the alias results obtained by each function.

MIR is introduced

Since SafeDrop is analyzed based on MIR, let’s first understand MRI correlation.

The compilation of Rust produces MIR (intermediate intermediate language) in the following syntax:

BasicBlock := {Statement} Terminator Statement := LValue = RValue | StorageLive(Value) | StorageDead(Value) | ... LValue := LValue | LValue.f | *LValue | ... RValue := LValue | move LValue | & LValue | & mut LValue | * LValue | * mut LValue | ... Terminator := Goto(BB) | Panic(BB) | Return | Resume | Abort | If(Value, BB0, BB1) | LVALUE = (FnCall, BB0, BB1) | Drop(Value, BB0, BB1) | SwitchInt(Value, BB0, BB1, BB2, ...) |...Copy the code

MIR is a directed acyclic Graph based on the Control Flow Graph (CFG), which mainly contains several syntactic elements mentioned above:

  • BasicBlock, which consists of basic statements and terminators.
  • The Statement, the base Statement, is the corresponding code for each line of Rust code that is translated into MIR
  • LValue && RValue, corresponding to position expression and value expression respectively.
  • The Terminator, Terminator, BasicBlock exports, including various forms of jump, such as the Return/Resume/Abort/IF/Drop and so on

SafeDrop is based on MIR analysis

As shown in the figure. The code is on the left, and the MIR generated by that code is on the right.

SafeDrop relies on MIR for static path analysis. The left side of the code gets the pointer to the heap PTR from the string S, and then creates a new variable with ownership of the heap memory via Vec::from_raw_parts. Now we have two Pointers to the same heap. I then commented out mem::forget(s), and the code had double free problems. However, the use-after-free problem will occur if you unpack mem::forget(s).

This code is relatively simple, but Miri can also detect both types of UB. However, there are differences between SafeDrop and Miri.

SafeDrop checks the call path in MIR and finds that drop(_0) and drop(_1) are called twice. Similarly, if mem::forget(s) annotation is turned on, it will also detect that UB is being used again after drop.

Formal definition problem

After some experimentation, the SafeDrop team formalized some of the most common issues addressed in the Unsafe Rust:

  • Buffering buffers in use

    If the algorithm mistakenly frees some buffers that will be accessed at a later time, it results in Pointers that are vulnerable to memory safety issues, including use after frees and double frees.

  • Setting off invalid Pointers. If an invalid pointer is dangling, discarding the pointer results in a double free; Otherwise, the pointer is lost. If an invalid pointer points to uninitialized memory that contains a pointer type, deleting the pointer may recursively delete its nested pointer and result in an invalid memory access.

A typical pattern for detecting invalid memory locations in SafeDrop

  • UAF: use after free;
  • DF: double free;
  • IMA: invalid memory access

SafeDrop detects invalid memory locations following seven patterns:

  • Pattern 1: GetPtr() -> UnsafeConstruct() -> Drop() -> Use() => UAF
  • Pattern 2: GetPtr() -> UnsafeConstruct() -> Drop() -> Drop() => DF
  • Pattern 3: GetPtr() -> Drop() -> UnsafeConstruct() -> Use() => UAF
  • Pattern 4: GetPtr() -> Drop() -> UnsafeConstruct() -> Drop() => DF
  • Pattern 5: GetPtr() -> Drop() -> Use() => UAF
  • Pattern 6: Uninitialized() -> Use() => IMA
  • Pattern 7: Uninitialized() -> Drop() => IMA

Detection method

The overall architecture

Above is a diagram of SafeDrop’s workflow. Integrate “Data Flow analysis by path-sensitive method” into a compiler named SafeDrop for analysis. It enters MIR for each function and prints warnings of potentially invalid memory.

SafeDrop key steps:

  • Path extraction: SafeDrop adopts the “meet-over-path” approach to achieve path sensitivity. Since the paths of functions can be infinite, we use a novel approach based on Tarjan algorithm to merge redundant paths and generate spanning trees. This step will traverse the tree and eventually enumerate all valuable paths.
  • Alias analysis: SafeDrop is field sensitive. This step analyzes the alias relationships between variables of each data flow and fields of compound type. SafeDrop is also cross-application and context-insensitive. It caches and reuses the alias relationship of the called obtained between the return value and the parameter.
  • Invalid DROP detection: Based on the previously established set of aliases, this step searches for potentially risky DROP patterns and logs suspicious code segments for each data flow.

The specific algorithm of the above three steps is described in detail in the paper. If you are interested in the algorithm details, you can read the paper by yourself.

The team has integrated SafeDrop with the Rust compiler V1.52 (unofficial action) and can be used by command-line tools such as RUSTC and Cargo.

Miri vs SafeDrop

Rust Miri is Rust MIR’s experimental interpreter, which can run binaries, the Cargo project’s test suite and detect undefined behavior.

Miri is more versatile than SafeDrop. Miri supports checking for out-of-bounds memory access, aligning insufficient memory access, and violating certain primitive type invariants.

However, Miri is a dynamic analysis method and cannot trace all the valuable paths of a program that cannot perform analysis and library analysis. In contrast, SafeDrop is a static method that analyzes all valuable paths for each function at compile time.

Future challenges

  • Enhanced performance.
  • Reduce detection error.
  • Check the Rust standard library.
  • Open source
  • other