Pin, Unpin, and Why Rust Needs Them
Using the Rust asynchronous library is usually easy. Except to write async and.await in various places, just like normal Rust code. But when you’re writing your own asynchronous library, it becomes very difficult. When I first tried it, I was very excited about T:? Unpin and Pin<&mut Self> are very confused and difficult to understand. I had never seen these types before and I couldn’t understand what they were for. But now THAT I understand them, I’ve written an interpreter that I hope I can come back and read. In this article, we will learn:
- What is the
Future
? - What is a self-referential type
self-referential types
? - Why are they unsafe?
Pin / Unpin
How are they kept safe?- How to use
Pin / Unpin
To write complex nestedFuture
What is theFuture
?
A few years ago, I needed to write some code that needed to be executed asynchronously and collect metrics. For example, how long it took. I want to write a TimedWrapper, which will be called like this:
// Some async function, e.g. polling a URL with [https://docs.rs/reqwest] // Remember, Rust functions do nothing until you .await them, so this isn't // actually making a HTTP request yet. let async_fn = reqwest::get("http://adamchalmers.com"); // Wrap the async function in my hypothetical wrapper. let timed_async_fn = TimedWrapper::new(async_fn); // Call the async function, which will send a HTTP request and time it. let (resp, time) = timed_async_fn.await; println! ("Got a HTTP {} in {}ms", resp.unwrap().status(), time.as_millis())Copy the code
I like this interface; it’s simple and should be easy for the rest of the team to use. Let’s make it happen! We all know that under Rust, an asynchronous function is just a regular function that returns a Future. The trait Future is very simple and means that this type has the following characteristics:
- Can be polled
- When it is polled, it should return
Pending
(TBD) orReady
(Ready) - If it is
Pending
, you should poll it later - If it is
Ready
, which will carry a response value. We could just call it thetaResolving
(solution)
Here is a simple example of implementing a Future that returns a random U16 type.
use std::{future::Future, pin::Pin, task::Context}
/// A future which returns a random number when it resolves.
#[derive(Default)]
struct RandFuture;
impl Future for RandFuture {
// Every future has to specify what type of value it returns when it resolves.
// This particular future will return a u16.
type Output = u16;
// The `Future` trait has only one method, named "poll".
fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
Poll::ready(rand::random())
}
}
Copy the code
It doesn’t seem difficult! I think we are ready to implement TimedWrapper.
Taste of nestingFuture
And stumble in use
Let’s define a type first.
pub struct TimedWrapper<Fut: Future> {
start: Option<Instant>,
future: Fut,
}
Copy the code
OK, so TimedWrapper is a generic type with Fut: Future. It will store a Future as a field and a start field to record when it was first polled. Let’s write a constructor.
impl<Fut: Future> TimedWrapper<Fut> {
pub fn new(future: Fut) -> Self {
Self { future, start: None }
}
}
Copy the code
There’s nothing too complicated here. The new method takes a Future and wraps it in a TimedWrapper. Of course, we must set start to None because it has not been polled yet. So let’s implement the poll function. All we need to do is implement the Future trait so that it can use.await.
impl<Fut: Future> Future for TimedWrapper<Fut> { // This future will output a pair of values: // 1. The value from the inner future // 2. How long it took for the inner future to resolve type Output = (Fut::Output, Duration); fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { // Call the inner poll, measuring how long it took. let start = self.start.get_or_insert_with(Instant::now); let inner_poll = self.future.poll(cx); let elapsed = self.elapsed(); match inner_poll { // The inner future needs more time, so this future needs more time too Poll::Pending => Poll::Pending, // Success! Poll::Ready(output) => Poll::Ready((output, elapsed)), } } }Copy the code
OK, it’s not too hard here, but there’s a problem: it doesn’t compile.
So, the Rust compiler reports an error on self.future.poll(cx), telling us that there is no poll method on the Fut in the current scope. This is confusing because we know that Fut is a Future, so it must have poll methods, right? Pin<&mut Fut> contains a poll method. So what is this strange type?
Okay, so we all know that there’s a receiver inside a method that can manipulate self, and the receivers can be self, &self, &mut self, and they represent getting ownership of self, borrowing of self, and variable borrowing of self, respectively. So Pin<&mut self> is a new, unfamiliar recipient. The Rust compiler explains that because of our Fut, we need the Pin<&mut self> type. There are two problems in this:
- What is the
Pin
? - If we have a type T, how do we get it
Pin<&mut T>
?
The rest of this article addresses these two questions by explaining some of the problems in Rust that can lead to unsafe code and why Pin can safely resolve them.
Self-reference is not safe
Pins exist to solve certain problems: self-referential data types, like data structures that have Pointers to themselves. For example, a binary lookup tree might self-reference Pointers to other nodes with the same structure.
Self-referential types are very useful, but they are also hard to keep memory safe. To see why, let’s use an example of two fields, an i32 field called val, and a pointer to i32 called pointer.
So far, so good. The pointer field points to the memory address A of the val field and contains A valid i32 type of data. All Pointers are valid, and the memory they point to is encoded with the correct value (in this case, type I32). But the Rust compiler often moves values in memory (transfer of ownership). For example, if we pass this structure into another method, it may be moved to a different memory address, or we should wrap it in Box and put it on the heap. Or, if the structure is in a Vec and we insert some values, the Vec may increase its capacity and need to move all its elements into a new, larger buffer.
When we move it, the fields in the structure change their memory address, not their value. So pointer still points to address A, but address A now doesn’t have A valid i32 value, the data from address A has been moved to address B, the address may have been written to by some other value! So now pointer is an illegal pointer. This is bad, at best an illegal pointer will crash the program, at worst it is a vulnerability that can be exploited by hackers. We just want to allow these memory-unsafe operations in the unsafe code block, and we need to be careful to comment on the type to tell users to update Pointers when movements occur.
Unpin and! Unpin
For review, all Rust types fall into two categories:
- Moving around in memory is safe. This is both default and canonical, including, for example, primitives like numbers, strings, booleans, as well as structures or enumerated types made up entirely of them. Most types fall into this category!
- Moving in memory is not a safe, self-referencing type. This is very rare. One example isintrusive linked list inside some Tokio internals(Some of the invasive linked lists in Tokio internals), another example is most implementations
Future
And borrow the type of data for reasons inRust async bookExplained in.
The types in class 1 are completely safe to move in memory. You can move them without invalidating any Pointers. But if you move the types in category 2, you invalidate those Pointers and may get undefined behavior. As we saw earlier, in earlier versions of Rust you had to be very careful with these types, not moving them, or using Unsafe to update all Pointers after you moved them. But with Rust 1.33, the compiler can automatically recognize which type all fall into and ensure that they can only be used safely.
Any type in class 1 implements a special auto trait called Unpin. Very strange name, but its meaning will soon become clear. Also, most common types implement Unpin because it is an auto trait (like Send, Sync, or Sized), so you don’t have to worry about implementing it yourself. If you’re not sure if a type can be safely moved, just check in docs. Rs to see if it’s Unpin!
The type in category 2 has a very creative name! Unpin (inside! Means not implemented). To use these types safely, we cannot use regular self-referential Pointers. Instead, we use special Pointers to “Pin” them somewhere to make sure they can’t be moved, which is what the Pin type is for.
Pin wraps a pointer and prevents it from moving, with the only exception that if the value is Unpin, we know it can be moved safely. Look! Now we can safely write self-referential type structures! This is important because in our discussion above, many futuresare self-referential types, and we need them to implement aysnc/.await.
Use the Pin
Now we understand why there is a Pin and why the recipient of our Future polling function (poll) is Pin<&mut self> instead of the regular &mut self. So let’s go back to the problem we just encountered: an internal Future requires a fixed reference in memory.
More generally: given a fixed structure, how do we manipulate its fields?
The solution is to write helper functions that provide you with references to fields. These references will be normal references like &mut self, or they can be fixed, you can choose whichever you want. This is called projection: if you have a fixed structure, you can write a projection method that gives you access to all of its fields.
The projection is really just passing data in and out of the Pin. For example, if we get the start: Option
field from Pin<&mut self>, we need to put Future: Fut into Pin so that we can call its poll method. If you look at the documentation for Pin, you know that if a pointer points to an Unpin value, it will always be safe; otherwise, unsafe is required.
// Putting data into Pin
pub fn new <P: Deref<Target:Unpin>>(pointer: P) -> Pin<P>;
pub unsafe fn new_unchecked<P> (pointer: P) -> Pin<P>;
// Getting data from Pin
pub fn into_inner <P: Deref<Target: Unpin>>(pin: Pin<P>) -> P;
pub unsafe fn into_inner_unchecked<P> (pin: Pin<P>) -> P;
Copy the code
I know being unsafe sounds scary, but writing unsafe code is still acceptable. I think unsafe is the compiler saying, “Hey, I don’t know if this code is compliant with the rules here, so I’m going to rely on you to check me.” The Rust compiler does a lot of work for us, and it’s only fair that we do a little work from time to time. If you want to learn how to write my own method of projection, I strongly recommend this article on the subject fasterthanli me/articles/PI… . But we’re going to take a shortcut:)
To switch to pin – project
Okay, it’s time to be honest, I don’t like unsafe code, I know I just explained why it’s okay to use unsafe, but who would use unsafe if given a choice? I ( ̄_, ̄)
I didn’t start writing Rust because I wanted to think through the consequences of running code, haha, I just wanted to be quick and not break things. Luckily, someone took pity on me and wrote a crate that generates a fully secure projection! It’s called pin-Project, and it’s great. All we need to do is add some macros to the definition:
#[pin_project::pin_project] // This generates a `project` method
pub struct TimedWrapper<Fut: Future> {
// For each field, we need to choose whether `project` returns an
// unpinned (&mut T) or pinned (Pin<&mut T>) reference to the field.
// By default, it assumes unpinned:
start: Option<Instant>,
// Opt into pinned references with this attribute:
#[pin]
future: Fut,
}
Copy the code
For each field, you must choose whether its projection should be fixed. By default, plain references should be used because they are simpler. But if you know you need a fixed quote. For example, because you want to call poll and its receiver is Pin<&mut Self>, you can do it with #[Pin].
Now we can finally poll the internal Future!
fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
// This returns a type with all the same fields, with all the same types,
// except that the fields defined with #[pin] will be pinned.
let mut this = self.project();
// Call the inner poll, measuring how long it took.
let start = this.start.get_or_insert_with(Instant::now);
let inner_poll = this.future.as_mut().poll(cx);
let elapsed = start.elapsed();
match inner_poll {
// The inner future needs more time, so this future needs more time too
Poll::Pending => Poll::Pending,
// Success!
Poll::Ready(output) => Poll::Ready((output, elapsed)),
}
}
Copy the code
Finally, our goal was accomplished — we wrote all the code and didn’t use any unsafe.
conclusion
If the Rust type has a self-referential pointer, it cannot be safely moved. After all, moving them does not update Pointers, so they still point to the old memory address, so they are now illegal. Rust can automatically determine which types are safe to move (and automatically Unpin them). If you have a fixed pointer to some data, Rust guarantees that nothing unsafe will happen (you can move it if it is safe to move it, and an error will be reported if it is not). This is important because many Future types are self-referential, so we need pins to safely poll the Future. You may not have to poll yourself (just use async/await), but if you do, use pin-Project Crate to simplify your code.
The original link: blog.adamchalmers.com/pin-unpin/