The original

Translator: Chai Jie


[TOC]

If you have programming experience in other languages, you will bring that experience with you when you switch to Rust. Usually, this is good because it means you don’t have to learn programming from scratch. However, it can also lead to bad habits that make you write bad code.

Use sentry values

In most C-like programming languages (C, C#, Java, etc.), the way to indicate that an operation failed or could not be found is to return a “special” value. For example, C#’s string.indexof () method scans an element in an array and returns its index. If none is found, -1 is returned.

This results in code like this:

string sentence = "The fox jumps over the dog";

int index = sentence.IndexOf("fox");

if(index ! = -1)
{
  string wordsAfterFox = sentence.SubString(index);
  Console.WriteLine(wordsAfterFox);
}
Copy the code

You’ll often see this “use a sentinel value to indicate something special,” as well as similar sentinels or null (someone once called them “billion-dollar errors”).

This is a bad idea, because there is absolutely nothing to stop you from forgetting this check. This means you could crash your program accidentally because of a faulty assumption, or because the code that generates the sentinel is far from the code that uses it.

But we can do better in Rust. Just use Option!

In Rust, you cannot get a valid value without handling the case that Option might be None. This is enforced by the compiler at compile time, which means that code that forgets to check won’t even compile.

let sentence = "The fox jumps over the dog";
let index = sentence.find("fox");

// let words_after_fox = &sentence[index..] ; // Error: Can't index str with Option
      

if let Some(fox) = index {
  letwords_after_fox = &sentence[fox..] ;println!("{}", words_after_fox);
}
Copy the code

Hungarian notation

In the 1970s, a naming convention called Hungarian notation began to be used in untyped or dynamically typed languages. It works by adding a mnemonic to the beginning of a name to indicate what it represents, for example, a Boolean access variable might be called bVisited, and a string name might be called strName.

You can still see this naming convention in Delphi, where classes (types) start with T, fields start with F, parameters start with A, and so on.

type
 TKeyValue = class
  private
    FKey: integer;
    FValue: TObject;
  public
    property Key: integer read FKey write FKey;
    property Value: TObject read FValue write FValue;
    function Frobnicate(ASomeArg: string): string;
  end;
Copy the code

C# also has a convention that all interfaces should start with I, which means programmers switching from C# to Rust will sometimes add I before traits as well.

trait IClone {
  fn clone(&self) - >Self;
}

let account_bytes: Vec<u8> = read_some_input();
let account_str = String::from_utf8(account_bytes)? ;letaccount: Account = account_str.parse()? ;Copy the code

In this case, just drop the I in front of it. Rust’s syntax ensures that it is impossible to confuse traits with normal types, so this markup is redundant. This is in stark contrast to C#, where interfaces and classes are largely interchangeable.

You can also see this in functions, where people rename something when they convert it from one form to another. These names are often silly or fabricated and provide negligible additional information to the reader.

let account_bytes: Vec<u8> = read_some_input();
let account_str = String::from_utf8(account_bytes)? ;letaccount: Account = account_str.parse()? ;Copy the code

I mean, if we call String::from_utf8(), we already know account_str will be a String, so why add the _str suffix?

Unlike many other languages, Rust encourages you to implicitly handle variables when you convert them from one form to another, especially if the previous variable is no longer accessible (for example, it has been moved).

let account: Vec<u8> = read_some_input();
let account = String::from_utf8(account)? ;letaccount: Account = account.parse()? ;Copy the code

This is kind of an advantage, because we can use the same name for the same concept.

Other languages dislike implicit processing because it is easy to lose track of what type a variable contains (for example, in dynamically typed languages like JavaScript) or to introduce errors (for example, when a programmer thinks a variable is a certain type, but it actually contains something else).

Neither of these cases is important for a strongly typed language like Rust with mobile semantics, so you can feel free to use implicit processing without worrying about getting yourself into trouble.

A lot of Rc < RefCell >

A common pattern in object-oriented languages is to take a reference to an object so that you can call its methods.

There’s nothing wrong with that, dependency injection is a very good thing, but unlike most OO languages, Rust doesn’t have a garbage collector, and Rust has a strong sense of shared variability.

Maybe it’s easier to understand with an example. Let’s say we’re implementing a game where the player has to beat up a bunch of monsters until a certain amount of damage is done to them.

We created a monster class that has a health property and a takeDamage() method, and in order to be able to keep track of how much damage has been done, we’re going to have people provide callbacks that will be called whenever the monster gets hurt.

type OnReceivedDamage = (damageReceived: number) = > void;

class Monster {
    health: number = 50;
    receivedDamage: OnReceivedDamage[] = [];

    takeDamage(amount: number) {
        amount = Math.min(this.health, amount);
        this.health -= amount;
        this.receivedDamage.forEach(cb= > cb(amount));
    }

    on(event: "damaged".callback: OnReceivedDamage): void {
        this.receivedDamage.push(callback); }}Copy the code

Let’s also create a DamageCounter class that tracks how much damage we do and lets us know when that goal is reached.

class DamageCounter {
    damageInflicted: number = 0;

    reachedTargetDamage(): boolean {
        return this.damageInflicted > 100;
    }

    onDamageInflicted(amount: number) {
        this.damageInflicted += amount; }}Copy the code

Now we’re going to create some monsters and keep doing random amounts of damage until DamageCounter is satisfied.

const counter = new DamageCounter();

const monsters = [new Monster(), new Monster(), new Monster(), new Monster(), new Monster()];
monsters.forEach(m= > m.on("damaged".amount= > counter.onDamageInflicted(amount)));

while(! counter.reachedTargetDamage()) {// pick a random monster
    const index = Math.floor(Math.random()*monsters.length);
    const target = monsters[index];
    // then damage it a bit
    const damage = Math.round(Math.random() * 50);
    target.takeDamage(damage);

    console.log(`Monster ${index} received ${damage} damage`);
}
Copy the code

Now write this code in Rust, using Box<dyn Fn(U32)> to represent a closure that takes a single U32 argument (all closures in JavaScript are assigned to the heap by default).

type OnReceivedDamage = Box<dyn Fn(u32) >;struct Monster {
    health: u32,
    received_damage: Vec<OnReceivedDamage>,
}

impl Monster {
    fn take_damage(&mut self, amount: u32) {
        let damage_received = cmp::min(self.health, amount);
        self.health -= damage_received;
        for callback in &mut self.received_damage { callback(damage_received); }}fn add_listener(&mut self, listener: OnReceivedDamage) {
        self.received_damage.push(listener); }}impl Default for Monster {
    fn default() - >Self {
        Monster { health: 100, received_damage: Vec::new() }
    }
}
Copy the code

Next up is DamageCounter.

#[derive(Default)]
struct DamageCounter {
    damage_inflicted: u32,}impl DamageCounter {
    fn reached_target_damage(&self) - >bool {
        self.damage_inflicted > 100
    }

    fn on_damage_received(&mut self, damage: u32) {
        self.damage_inflicted += damage; }}Copy the code

Finally, the code that causes damage.

fn main() {
    let mut rng = rand::thread_rng();
    let mut counter = DamageCounter::default();
    let mut monsters: VecThe < _ > = (0.5).map(|_| Monster::default()).collect();

    for monster in &mut monsters {
        monster.add_listener(Box::new(|damage| counter.on_damage_received(damage)));
    }

    while! counter.reached_target_damage() {let index = rng.gen_range(0..monsters.len());
        let target = &mut monsters[index];

        let damage = rng.gen_range(0.50);
        target.take_damage(damage);

        println!("Monster {} received {} damage", index, damage); }}Copy the code

When compiling the code, Rustc gives four compilation errors about monster.add_listener().

error[E0596]: cannot borrow `counter` as mutable, as it is a captured variable in a `Fn` closure
  --> src/main.rs:47:48
   |
47 |         monster.add_listener(Box::new(|damage| counter.on_damage_received(damage)));
   |                                                ^^^^^^^ cannot borrow as mutable

error[E0499]: cannot borrow `counter` as mutable more than once at a time
  --> src/main.rs:47:39
   |
47 |         monster.add_listener(Box::new(|damage| counter.on_damage_received(damage)));
   |                              ---------^^^^^^^^------------------------------------
   |                              |        |        |
   |                              |        |        borrows occur due to use of `counter` in closure
   |                              |        `counter` was mutably borrowed here in the previous iteration of the loop
   |                              cast requires that `counter` is borrowed for `'static`

error[E0597]: `counter` does not live long enough
  --> src/main.rs:47:48
   |
47 |         monster.add_listener(Box::new(|damage| counter.on_damage_received(damage))); | ------------------^^^^^^^---------------------------- | | | | | | | borrowed value does not live long enough | | value  captured here | cast requires that `counter` is borrowedfor `'static`...60 | }
   | - `counter` dropped here while still borrowed

error[E0502]: cannot borrow `counter` as immutable because it is also borrowed as mutable
  --> src/main.rs:50:12
   |
47 |         monster.add_listener(Box::new(|damage| counter.on_damage_received(damage)));
   |                              -----------------------------------------------------
   |                              |        |        |
   |                              |        |        first borrow occurs due to use of `counter` in closure
   |                              |        mutable borrow occurs here
   |                              cast requires that `counter` is borrowed for `'static`...50 |     while! counter.reached_target_damage() { | ^^^^^^^ immutable borrow occurs hereCopy the code

These compilation issues boil down to:

  • The closure captures a reference to counter

  • The counter.on_damage_received() method requires &mut self, so our closure needs an &mut reference. We add closures in a loop, so we end up getting multiple &mut references to the same object at the same time.

  • The listener is a box closure with no lifecycle parameters, which means it has ownership of the variable. Because we moved counter into the closure in the loop, we got a “use of Moved Value” error.

  • After passing counter to add_listener(), you try to use it in the loop condition.

The typical solution to the above problem is to wrap DamageCounter in a reference-counting pointer, which allows multiple handles to exist simultaneously. And since we also need to call a &mut self method, we need a RefCell to “move” the borrowing check from compile time to run time.

fn main() {
     let mut rng = rand::thread_rng();
-    let mut counter = DamageCounter::default();
+    let mut counter = Rc::new(RefCell::new(DamageCounter::default()));
     let mut monsters: VecThe < _ > = (0.5).map(|_| Monster::default()).collect();

     for monster in &mut monsters {
-        monster.add_listener(Box::new(|damage| counter.on_damage_received(damage)));
+        let counter = Rc::clone(&counter);
+        monster.add_listener(Box::new(move |damage| {
+            counter.borrow_mut().on_damage_received(damage)
+        }));
     }

-    while! counter.reached_target_damage() { +while! counter.borrow().reached_target_damage() {let index = rng.gen_range(0..monsters.len());
         let target = &mutmonsters[index]; . }}Copy the code

HMM… It works. But this approach tends to be confusing, especially if you store something non-trivial in the structure, such as Rc<RefCell<Vec>>> (or Arc<Mutex<Vec>>> about multithreading).

This also opens you up to situations where the RefCell can be borrowed multiple times because your code is complex and something higher on the call stack is already using the RefCell. In the case of Mutex, this would cause a deadlock, while RefCell would cause panic, both of which would have a negative impact on the reliability of the program.

A better approach is to change your API so that it does not hold long-term references to other objects. Depending on the situation, it may be wise to accept a callback parameter in the Monster::take_damage() method.

struct Monster {
    health: u32,}impl Monster {
    fn take_damage(&mut self, amount: u32, on_damage_received: impl FnOnce(u32)) {
        let damage_received = cmp::min(self.health, amount);
        self.health -= damage_received; on_damage_received(damage_received); }}impl Default for Monster {
  fn default() - >Self { Monster { health: 100}}}...fn main() {
    let mut rng = rand::thread_rng();
    let mut counter = DamageCounter::default();
    let mut monsters: VecThe < _ > = (0.5).map(|_| Monster::default()).collect();

    while! counter.reached_target_damage() {let index = rng.gen_range(0..monsters.len());
        let target = &mut monsters[index];

        let damage = rng.gen_range(0.50);
        target.take_damage(damage, |dmg| counter.on_damage_received(dmg));

        println!("Monster {} received {} damage", index, damage); }}Copy the code

One benefit of this is that we get rid of all the callback management template code, which means that this version is only 47 lines, as opposed to the 62 lines of the Rc<RefCell<_>> version.

In some cases, it may not be acceptable to give take_damage() a callback argument, at which point you can return a “summary” of what happened so the caller can decide what to do next.

impl Monster {
    fn take_damage(&mut self, amount: u32) -> AttackSummary {
        let damage_received = cmp::min(self.health, amount);
        self.health -= damage_received;
        AttackSummary { damage_received }
    }
}

struct AttackSummary {
    damage_received: u32,}...fn main() {
    let mut rng = rand::thread_rng();
    let mut counter = DamageCounter::default();
    let mut monsters: VecThe < _ > = (0.5).map(|_| Monster::default()).collect();

    while! counter.reached_target_damage() {let index = rng.gen_range(0..monsters.len());
        let target = &mut monsters[index];

        let damage = rng.gen_range(0.50);
        let AttackSummary { damage_received } = target.take_damage(damage);
        counter.on_damage_received(damage_received);

        println!("Monster {} received {} damage", index, damage); }}Copy the code

This is my preferred solution; As a rule of thumb, it tends to work well for larger code or complex code.

** Uses the wrong integer type **

Another bad habit of writing a lot of C is using the wrong integer types and getting frustrated with the frequent conversions of usize.

Especially when it comes to indexing, C programmers are taught to use ints for indexing and for-loops, so when they use Rust and need to store a list of indexes, they tend to use Vec. Then they get frustrated because Rust is pretty strict about indexing, and standard types like arrays, slicing, and Vec can only be indexed using usize (equivalent to size_t), which means their code gets messed up by the conversion from i32 to USize.

There are many valid reasons why Rust only allows indexing with USize:

  • It doesn’t make sense to have a negative index (it’s illegal to access items before a fragment starts), so we can avoid bugs by indexing them with unsigned integers.

  • Usize is defined as an integer of the same size as a normal pointer, which means that pointer operations don’t have any hidden conversions.

  • The STD ::mem::size_of() and STD ::mem::align_of() functions return the usize type.

The solution is obvious. Choose the correct integer type for your application. When used for indexing, this “correct integer type” is likely to be usize.

** not safe – I know what I’m doing **

I think of this phrase every time I see a dusty C programmer looking for a raw pointer or STD ::mem:: Transmute () because the borrowed inspector keeps rejecting their code.

You’ll often see people trying to hack privacy, create self-referential structures, or create globally mutable variables in an unsafe way. Comments like “But I know this program only uses one thread, so it’s ok to access a static Mut” or “But this works perfectly in C” are common.

The reality is that insecure code is slightly different, and you need to have a good intuition about Rust’s borrowing check rules and memory model. I don’t want to be the goalie and say, “You have to be this tall to write unsafe multithreaded code,” but if you’re new to the language, you probably don’t have that intuition and will be putting yourself and your colleagues through a lot of pain.

Playing with insecure stuff is fine if you want to learn more about Rust, or if you know what you’re doing and use it legally, but insecure stuff is not a magical sanctuary that will stop the compiler from complaining and let you write C in Rust syntax.

** does not use the namespace **

In C, it is a common practice to prefix a function with the name of a library or module to help the reader understand where it is coming from and to avoid repeating symbolic errors (such as rune_wasmer_runtime_load()).

Rust, however, there are real namespace, and allows you to attach the method to the type (for example, rune: : wasmer: : Runtime: : the load ()).

** Excessive use of slice index **

For-loops and indexes are used frequently in C-like languages.

let points: Vec<Coordinate> = ... ;let differences = Vec::new();

for i in 1..points.len() [
  let current = points[i];
  let previous = points[i-1];
  differences.push(current - previous);
]
Copy the code

However, even experienced programmers are not immune to introducing errors when using indexes (for example, I need to remember to start the loop from 1 and subtract 1 to get to the previous point).

In this case, Rust encourages you to use iterators. The slice type even comes with advanced tools, such as Windows () and array_windows() methods, that let you iterate over adjacent pairs of elements.

let points: Vec<Coordinate> = ... ;let mut differences = Vec::new();

for [previous, current] in points.array_windows().copied() {
  differences.push(current - previous);
}
Copy the code

You can even remove for-loop and differences variables.

let differences: Vec<_> = points
  .array_windows()
  .copied()
  .map(|[previous, current]| current - previous)
  .collect();
Copy the code

Some people think the versions with map() and Collect () are cleaner or more “practical”, but I’ll let you judge for yourself.

And iterators generally have better performance because checking can be done as part of the loop condition rather than in isolation.

Overuse of iterators

Overusing iterators can also cause problems: when all you have is a hammer, everything looks like a nail.

The long chain of map(), filter(), and and_then() calls can be hard to read and track down what’s actually happening, especially if type reasoning lets you omit the types of closure arguments.

Other times, your iterator-based solution is just unnecessarily complex. As an example, take a look at this code and see if you can figure out what it does.

pub fn functional_blur(input: &Matrix) -> Matrix {
    assert!(input.width >= 3);
    assert!(input.height >= 3);

    // Stash away the top and bottom rows so they can be
    // directly copied across later
    let mut rows = input.rows();
    let first_row = rows.next().unwrap();
    let last_row = rows.next_back().unwrap();

    let top_row = input.rows();
    let middle_row = input.rows().skip(1);
    let bottom_row = input.rows().skip(2);

    let blurred_elements = top_row
        .zip(middle_row)
        .zip(bottom_row)
        .flat_map(|((top, middle), bottom)| blur_rows(top, middle, bottom));

    let elements: Vec<f32> = first_row
        .iter()
        .copied()
        .chain(blurred_elements)
        .chain(last_row.iter().copied())
        .collect();

    Matrix::new_row_major(elements, input.width, input.height)
}

fn blur_rows<'a>(
    top_row: &'a [f32],
    middle_row: &'a [f32],
    bottom_row: &'a [f32]) - >impl Iterator<Item = f32> + 'a {
    // stash away the left-most and right-most elements so they can be copied across directly.
    let &first = middle_row.first().unwrap();
    let &last = middle_row.last().unwrap();

    // Get the top, middle, and bottom row of our 3x3 sub-matrix so they can be
    // averaged.
    let top_window = top_row.windows(3);
    let middle_window = middle_row.windows(3);
    let bottom_window = bottom_row.windows(3);

    // slide the 3x3 window across our middle row so we can get the average
    // of everything except the left-most and right-most elements.
    let averages = top_window
        .zip(middle_window)
        .zip(bottom_window)
        .map(|((top, middle), bottom)| top.iter().chain(middle).chain(bottom).sum::<f32> () /9.0);

    std::iter::once(first)
        .chain(averages)
        .chain(std::iter::once(last))
Copy the code

Believe it or not, this is one of the more readable versions I’ve seen. Now let’s look at imperative implementations.

pub fn imperative_blur(input: &Matrix) -> Matrix {
    assert!(input.width >= 3);
    assert!(input.height >= 3);

    // allocate our output matrix, copying from the input so
    // we don't need to worry about the edge cases.
    let mut output = input.clone();

    for y in 1..(input.height - 1) {
        for x in 1..(input.width - 1) {
            let mut pixel_value = 0.0;

            pixel_value += input[[x - 1, y - 1]];
            pixel_value += input[[x, y - 1]];
            pixel_value += input[[x + 1, y - 1]];

            pixel_value += input[[x - 1, y]];
            pixel_value += input[[x, y]];
            pixel_value += input[[x + 1, y]];

            pixel_value += input[[x - 1, y + 1]];
            pixel_value += input[[x, y + 1]];
            pixel_value += input[[x + 1, y + 1]];

            output[[x, y]] = pixel_value / 9.0;
        }
    }

    output
}
Copy the code

I like this readable version.

No pattern matching is used

In most other mainstream languages, it is common for programmers to write a check before doing an operation that might throw an exception. Our previous C# IndexOf() fragment is a good example.

int index = sentence.IndexOf("fox");

if(index ! = -1)
{
  string wordsAfterFox = sentence.SubString(index);
  Console.WriteLine(wordsAfterFox);
}
Copy the code

You might see code like this:

let opt: OptionThe < _ > =... ;if opt.is_some() {
  letvalue = opt.unwrap(); . }Copy the code

or

let list: &[f32] = ...;

if! list.is_empty() {let first = list[0]; . }Copy the code

Both fragments are now perfectly valid code and never fail, but like sentinels, they are prone to introducing errors when refactoring code in the future.

Using pattern matching and Option can help you avoid this situation because it ensures that the only way you can access a value is if it is valid.

if let Some(value) = opt {
  ...
}

if let[first, ..]  = list { ... }Copy the code

Class after construction

In many languages, it is normal to call an object’s constructor and then initialize its fields (either manually or by calling some init() method). However, this violates the general convention of Rust to make an invalid state unexpressible.

Suppose you are writing an NLP application and have a dictionary containing all possible words.

Here is one way to create a dictionary:

let mut dict = Dictionary::new();
// read the file and populate some internal HashMap or Vec
dict.load_from_file("./words.txt")? ;Copy the code

However, writing a Dictionary this way means that it now has two (hidden) states — empty and populated.

All downstream code that uses Dictionary assumes that it is already populated and writes code accordingly. If you pass an empty dictionary to code that expects a populated dictionary, you can cause panic. For example, indexing a dictionary with dict[“word”] will cause panic if “word” is not there.

But this is completely unnecessary. Just make sure the dictionary is available immediately after the build, rather than populating it afterwards.

let dict = Dictionary::from_file("./words.txt")? ;impl Dictionary {
  fn from_file(filename: impl AsRef<Path>) -> Result<Self, Error> {
    lettext = std::fs::read_to_string(filename)? ;let mut words = Vec::new();
    for line in text.lines() {
      words.push(line);
    }
    Ok(Dictionary { words })
  }
}
Copy the code

Dictionary::from_file() might create an empty Vec and populate it step by step, but it won’t be stored in the Dictionary’s fields yet, so there’s no assumption that it’s populated and useful.

How often you fall into this antipattern depends largely on your background and coding style.

Functional languages are usually immutable, so you will naturally use idiomatic patterns. After all, it’s a little hard to create something semi-initialized and populate it later when you’re not allowed to change anything.

Object-oriented languages, on the other hand, allow objects to be initialized after they have been built, especially since object references are empty by default, and they have no concerns about variability. This makes object-oriented languages prone to NullPointerExceptions.

Defensive copy

A nice property of immutable objects is immutable. However, in languages like Python and Java, immutability is not transitive — that is, if x is an immutable object, x.y is not guaranteed to be immutable unless it is explicitly defined to be.

That means it’s possible to write code like this.

class ImmutablePerson:
  def __init__(self, name: str, age: int, addresses: List[str) :self._name = name
    self._age = age
    self._addresses = addresses

  # read-only properties
  @property
  def name(self) :return self._name
  @property
  def age(self) :return self._age
  @property
  def addresses(self) :return self._addresses
Copy the code

Then someone came along and accidentally messed up the address list.

def send_letters(message: str, addresses: List[str]):
  # Note: the post office's API only works with with uppercase letters so we
  # need to pre-process the address list
  for i, address in enumerate(addresses):
    addresses[i] = addresses.upper()

  client = PostOfficeClient()
  client.send_bulk_mail(message, addresses)


person = ImmutablePerson("Joe Bloggs".42["123 Fake Street"])

send_letters(
  f"Dear {person.name}, I Nigerian prince. Please help me moving my monies.",
  person.addresses
)

print(person.addresses) # ["123 FAKE STREET"]
Copy the code

While I admit that this example is a bit mannered, it is quite common for functions to modify the parameters they are given. Usually this is good, but when your ImmutablePerson assumes that its address field will never change, some random code on the other side of the project that changes it without you knowing is annoying.

A typical solution is to pre-copy the list so that even if callers try to change its contents, they will change a copy instead of the original address field.

class ImmutablePerson:
  ...

  @property
  def addresses(self) :return self._addresses.copy()
Copy the code

In general, you’ll see defensive copies used anywhere you want to make sure that another piece of code doesn’t modify a shared object at the wrong time.

Given that this is an article about Rust, you can probably guess what the roots are — a combination of aliases and variability.

You can also guess why defensive copying isn’t really necessary when writing Rust code — the “shared immutable XOR single mutable” rule of lifetime and references means that, It is impossible for code to modify something without first requesting mutable access from its original owner or explicitly selecting shared variation by using a type like STD ::sync::Mutex.

conclusion

There are a bunch of other bad habits that I haven’t touched on yet, or haven’t included because I can’t think of a neat example. Finally thanks to everyone who responded to my post with suggestions.