The original title: Study of STD: : IO: : Error links: the original matklad. Making. IO / 2020/10/15 /… Went down to my Praying platform

In this article, we’ll examine the implementation of the STD :: IO ::Error type in the Rust library. The corresponding code in the library/STD/SRC/IO error. Rs.

You can take this article as:

  1. The study of a part of a standard library

  2. An advanced error management guide

  3. A beautiful EXAMPLE of API design

Reading this article requires a basic understanding of Rust’s error handling.


When designing Error types with Result

, the main question is “How will the Error be used?” . Usually, one of the following things happens.
,>

  • Errors are handled by code. The user checks for errors, so the internal structure should be properly exposed.

  • The error is propagated and shown to the user. Users will not check for errors beyond FMT ::Display; So its internal structure can be encapsulated.

Notice the interplay between exposing implementation details and encapsulating them. For implementing the first case, a common anti-pattern is to define a kitchen-sink enumeration:

pub enum Error {
  Tokio(tokio::io::Error),
  ConnectionDiscovery {
    path: PathBuf,
    reason: String,
    stderr: String,
  },
  Deserialize {
    source: serde_json::Error,
    data: String,},... , Generic(String),}Copy the code

But there are many problems with this approach.

First, errors exposed from the underlying libraries become part of the public API. If there are major changes to your dependencies, you will need to make significant changes as well.

Second, it specifies all the implementation details. For example, if you notice that ConnectionDiscovery is big, boxing it would be a disruptive change.

Third, it often implies larger design issues. The Kitchen Sink error packages different failure modes into one type. However, if the failure mode is very different, it may not be handled properly. This looks more like case two.

A more effective approach to the Kitchen-sink problem is to push the error to the caller. Consider the following example:

fn my_function() - >Result<i32, MyError> {
  letthing = dep_function()? ; .Ok(92)}Copy the code

My_function calls dep_function, so MyError should be converted from DepError. Here’s a better way to do it

fn my_function(thing: DepThing) -> Result<i32, MyError> {
  ...
  Ok(92)}Copy the code

In this version, callers can focus on executing dep_function and handling its errors. This is more typing for more type safety. MyError and DepError are now different types, and callers can handle them separately. If DepError is a variant of MyError, then a run-time match may be required.

An extreme version of this idea is SAN-IO programming. For many errors from I/O, if you push all I/O errors to the caller, you can skip most of the error handling.

While using enumerations is bad, it does maximize checkability in the first case.

The second type of error management, centered on propagation, is typically handled using boxed trait objects. A type like Box

can be built on any specific error, can be printed out by Display, and can be optionally exposed by dynamically casting down. Anyhow is the best example of this style.

The case of STD :: IO ::Error is interesting because it wants to do both and more.

  • This is STD, so encapsulation and future-oriented is Paramount.

  • I/O errors from the operating system can usually be handled (for example, EWOULDBLOCK).

  • With a system programming language, it is important to actually expose the underlying system errors.

  • IO ::Error can be used as a vocabulary type and should be able to represent some non-system errors. For example, Rust’s internal Path can be 0 bytes, and opening such a Path should return an IO ::Error before making the system call.

Here’s what STD :: IO ::Error looks like:

pub struct Error {
  repr: Repr,
}

enum Repr {
  Os(i32),
  Simple(ErrorKind),
  Custom(Box<Custom>),
}

struct Custom {
  kind: ErrorKind,
  error: Box<dyn error::Error + Send + Sync>,}Copy the code

The first thing to note is that it is an internal enumeration, but this is a well-hidden implementation detail. In order to be able to check and handle various error conditions, there is a single, exposed, field-free kind enumeration.

#[derive(Clone, Copy)]
#[non_exhaustive]
pub enum ErrorKind {
  NotFound,
  PermissionDenied,
  Interrupted,
  ...
  Other,
}

impl Error {
  pub fn kind(&self) -> ErrorKind {
    match &self.repr {
      Repr::Os(code) => sys::decode_error_kind(*code),
      Repr::Custom(c) => c.kind,
      Repr::Simple(kind) => *kind,
    }
  }
}
Copy the code

Although both ErrorKind and Repr are enumerations, publicly exposed ErrorKind is scary. Another point to note is the #[non_exhaustive] copy-free design — with no logical alternatives or compatibility issues.

Some IO ::Errors are just native OS error codes:

impl Error {
  pub fn from_raw_os_error(code: i32) -> Error {
    Error { repr: Repr::Os(code) }
  }
  pub fn raw_os_error(&self) - >Option<i32> {
    match self.repr {
      Repr::Os(i) => Some(i), Repr::Custom(..) = >None, Repr::Simple(..) = >None,}}}Copy the code

The platform-specific SYS ::decode_error_kind function is responsible for mapping error codes to ErrorKind enumerations. All of this means that code can handle error categories in a cross-platform way by checking.kind(). Also, it is possible to handle a very specific error code in an operating system dependent manner. These apis provide convenient abstractions without ignoring important low-level details.

An STD :: IO ::Error can also be built from an ErrorKind:

impl From<ErrorKind> for Error {
  fn from(kind: ErrorKind) -> Error {
    Error { repr: Repr::Simple(kind) }
  }
}
Copy the code

This provides a cross-platform access error code style of error handling. This is handy if you need the fastest error handling.

Finally, there is a third, completely custom representation:

impl Error {
  pub fn new<E>(kind: ErrorKind, error: E) -> Error
  where
    E: Into<Box<dyn error::Error + Send + Sync>>,
  {
    Self::_new(kind, error.into())
  }

  fn _new(
    kind: ErrorKind,
    error: Box<dyn error::Error + Send + Sync>,
  ) -> Error {
    Error {
      repr: Repr::Custom(Box::new(Custom { kind, error })),
    }
  }

  pub fn get_ref(&self) - >Option"The & (dyn error::Error + Send + Sync + 'static) > {match &self.repr { Repr::Os(..) = >None, Repr::Simple(..) = >None,
      Repr::Custom(c) => Some(&*c.error),
    }
  }

  pub fn into_inner(
    self) - >Option<Box<dyn error::Error + Send + Sync> > {match self.repr { Repr::Os(..) = >None, Repr::Simple(..) = >None,
      Repr::Custom(c) => Some(c.error),
    }
  }
}
Copy the code

Note that:

  • The generic new function delegates to the singleton _new function, which improves compilation time because less code has to be repeated during the singleton process. I think this also improves runtime efficiency: the _new function is not marked inline, so the function call is generated at the call point. This is a good thing, because error constructs are less popular and saving instruction caches is more popular.

  • The Custom variable is boxed — this is to keep the overall size_of smaller. The size on the stack of errors is important: there is an overhead even if there are no errors.

  • Both types point to a ‘static’ error:

type A= & (dyn error::Error + Send + Sync + 'static);
type B = Box<dyn error::Error + Send + Sync>
Copy the code

In a dyn Trait + ‘_, ‘_ is an ellipse of ‘static, unless the Trait object is hidden behind a reference, in which case it is abbreviated to &’a dyn Trait + ‘a.

  • Get_ref get_mut as wellinto_innerProvides full access to the underlying error. withos_errorSimilarly, the abstraction obfuscates the details, but also provides hooks to retrieve the original underlying data.

Similarly, the Display implementation reveals the most important details about the internal representation.

impl fmt::Display for Error {
  fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
    match &self.repr {
      Repr::Os(code) => {
        let detail = sys::os::error_string(*code);
        write!(fmt, "{} (os error {})", detail, code)
      }
      Repr::Simple(kind) => write!(fmt, "{}", kind.as_str()),
      Repr::Custom(c) => c.error.fmt(fmt),
    }
  }
}
Copy the code

Error: STD :: IO ::Error

  • Encapsulate its internal representation and optimize it by boxing the larger enumeration variable,

  • The ErrorKind pattern provides a convenient way to handle errors based on category,

  • If any, it can fully expose the underlying operating system’s bugs.

  • Any other error type can be transparently wrapped.

This last point means that IO ::Error can be used for ad-hoc errors because &str and String can be converted to Box

:

io::Error::new(io::ErrorKind::Other, "something went wrong")
Copy the code

It can also be used for the simple substitution of Anyhow. I think some libraries might simplify their error handling by:

io::Error::new(io::ErrorKind::InvalidData, my_specific_error)
Copy the code

For example, serde_json provides the following:

fn from_reader<R, T>(rdr: R) -> Result<T, serde_json::Error>
where
  R: Read,
  T: DeserializeOwned,
Copy the code

Read will fail with IO ::Error, so serde_json::Error needs to be able to represent IO ::Error. I think this is backwards (but I don’t know the full context, I’d be happy if I was proved wrong), and the signature should read something like this:

fn from_reader<R, T>(rdr: R) -> Result<T, io::Error>
where
  R: Read,
  T: DeserializeOwned,
Copy the code

Then, serde_json::Error has no Io variable and is hidden in Io ::Error of type InvalidData.

I think STD :: IO ::Error is a truly remarkable type that can serve many different use cases without too much compromise. But can we do better?

The first problem with STD :: IO ::Error is that when a file system operation fails, you don’t know the path where it failed. This is understandable — Rust is a system language, so it shouldn’t add much more than what the OS offers natively. The OS returns an integer return code, and coupling it to a PathBuf allocated on the heap may be an unacceptable overhead.

I was surprised to find that STD is allocated in virtually every path-related system call. It needs to exist in some form. The OS API requires a zero byte at the end of the string. But I wonder if it makes sense to use stack-allocated buffers for short paths. Probably not _ Paths are usually not that short, and modern allocators handle instantaneous allocations efficiently.

I don’t know of any good solution. One option is to add switches at compile time (once we get stD-aware cargo) or at run time (like RUST_BACKTRACE), where all path-related I/O errors are allocated on the heap. A similar problem is that IO ::Error does not support backtrace.

Another problem is that STD :: IO ::Error is inefficient.

  • It’s quite large in size:
assert_eq!(size_of::<io::Error>(), 2 * size_of::<usize> ());Copy the code
  • For custom cases, it produces quadratic indirection and allocation:
enum Repr {
  Os(i32),
  Simple(ErrorKind),
  // First Box :|
  Custom(Box<Custom>),
}

struct Custom {
  kind: ErrorKind,
  // Second Box :(
  error: Box<dyn error::Error + Send + Sync>,}Copy the code

I think we can fix that now!

First, we can avoid the secondary indirection by using a lighter trait object, in the form of a failure or anyhow. Now, with GlobalAlloc, it is a relatively straightforward implementation.

Second, we can hide both OS and Simple variables in the usize with the least significant bits, based on the fact that the Pointers are aligned. I think we can even imagine using the second least significant bit, leaving the first significant bit for something else. That way, even something like IO ::Result can be a pointer size!

That concludes this article. The next time you’re trying to design an Error type for your library, take a moment to look at the source code for STD :: IO ::Error, and you might find something to learn from.


Educational problems

Look at this line in the implementation :Repr::Custom(c) => c. ror. FMT (FMT)

impl fmt::Display for Error {
  fn fmt(&self, fmt: &mut fmt::Formatter<'_>) -> fmt::Result {
    match &self.repr {
      Repr::Os(code) => {
        let detail = sys::os::error_string(*code);
        write!(fmt, "{} (os error {})", detail, code)
      }
      Repr::Simple(kind) => write!(fmt, "{}", kind.as_str()),
      Repr::Custom(c) => c.error.fmt(fmt),
    }
  }
}
Copy the code
  1. Why does this line of code even work?

  2. How does it work?