By Mr. Chi (Skyzh)/Edited by Zhang Handong
This paper introduces the basic use method of IO_uring, and then introduces the realization method of asynchronous file library written by myself, and finally makes a benchmark to compare performance with MMAP.
TL; DR
In a word: in Skyzh /uring -io I packed the low-layer IO_uring interface provided by Tokio and implemented in Rust an asynchronous random read of the IO_uring based file. You can use it like this:
ctx.read(fid, offset, &mut buf).await? ;Copy the code
Io_uring profile
Io_uring is an asynchronous I/O interface provided by the Linux kernel. It was released in Linux 5.1 in May 2019 and is now being used in various projects. Such as:
- The MultiRead RocksDBSo far is through
io_uring
Do concurrent reading of files. - Tokio for
io_uring
Wraps a layer of API. At the time of Tokio 1.0’s release, the developers stated that true asynchronous file manipulation would be available through IO_uring (seeAnnouncing Tokio 1.0). Tokio’s asynchronous file operations are currently implemented by opening a separate I/O thread calling the synchronization API. - QEMU 5.0 is already in use
io_uring
(seeChangeLog).
Most of the current testing for IO_uring involves comparing Direct I/O performance (1), (2), (3) with Linux AIO. Io_uring typically achieves twice the performance of AIO.
Random file read scenario
In database systems, we often need multithreading to read the contents of files at any position (
,
,
). The commonly used Read/write API cannot do this (because to seek first, exclusive file handles are required). The following method can implement random file reads.
- through
mmap
Map files directly into memory. Files are read directly into memory and can be read concurrently in multiple threads. pread
You can go from a certain positionoffset
Began to readcount
A byte, also support multithreading concurrent read.
However, both options block the current thread. For example, when mMAP reads a block of memory and produces a Page fault, the current thread blocks. Pread itself is a blocking API. Asynchronous apis, such as Linux AIO/IO_uring, can reduce context switches and thus improve throughput in certain scenarios.
Basic usage of io_uring
Io_uring related syscall can be found here. Liburing provides an easier API to use. Tokio’s IO_uring Crate builds on this by providing the IO_uring API in Rust. The following uses IO_uring as an example.
To use IO_uring, you need to create a ring. Here we use the Concurrent API provided by Tokio-RS/IO-uring to support multiple threads using the same ring.
use io_uring::IoUring;
let ring = IoUring::new(256)? ;let ring = ring.concurrent();
Copy the code
Each ring corresponds to a commit queue and a completion queue, which is set to hold up to 256 elements.
I/O through IO_uring is a three-step process: adding tasks to the commit queue, committing tasks to the kernel, and fetching tasks from the completion queue. This section uses file reading as an example to describe the whole process.
Opcode ::Read constructs a file-reading task, and ring-.submission ().push(entry) adds the task to the queue.
use io_uring::{opcode, types::Fixed};
let read_op = opcode::Read::new(Fixed(fid), ptr, len).offset(offset);
let entry = read_op
.build()
.user_data(user_data);
unsafe{ ring.submission().push(entry)? ; }Copy the code
Once the task is added, it is committed to the kernel.
assert_eq!(ring.submit()? .1);
Copy the code
Finally, the completed tasks are polled.
loop {
if let Some(entry) = ring.completion().pop() {
// do something}}Copy the code
In this way, we implement random reads of files based on IO_uring.
Note 1: IO_URING currently has three execution modes: default mode, poll mode, and kernel poll mode. If you use kernel poll mode, you don’t necessarily need to call the submit task function.
Use IO_URING to implement asynchronous file reading interface
Our goal is to implement something like this interface, wrapping io_uring up and exposing developers to a simple read function.
ctx.read(fid, offset, &mut buf).await? ;Copy the code
After referring to tokio-Linux-AIO’s asynchronous wrapper for Linux AIO, I took the following approach to implement io_uring based asynchronous reads.
- Developers are using
io_uring
Before, you need to create oneUringContext
. UringContext
When created, one (or more) tasks are executed in the background to submit and poll for completionUringPollFuture
. (Corresponding to the operations in step 2 and Step 3 in the previous chapter).- Developers can use the
ctx
To call the interface that reads the file, usectx.read
To create aUringReadFuture
. In the callctx.read.await
After:UringReadFuture
An object is created that is fixed in memoryUringTask
“, and put the read file task into the queueUringTask
Is used as the user data for the read operation.UringTask
There’s a channel in there.UringPollFuture
Submit tasks in the background.UringPollFuture
Polling the completed tasks in the background.UringPollFuture
Take out the user data and restore it toUringTask
Object that is notified by channelUringReadFuture
The I/O operation is complete.
The whole process is shown below.
This way, we can easily call IO_uring to implement asynchronous reads of files. As a side benefit, task submission can be batched automatically. Typically, an I/O operation produces a Syscall. However, since we use a single Future to submit and poll for tasks, there may be multiple uncommitted tasks in the queue at the time of submission, which can be submitted all at once. This reduces the syscall context cut overhead (and of course increases latency). As measured by Benchmark, each commit can pack up about 20 read tasks.
Benchmark
Compare the performance of the packaged IO_uring and MMAP. The test load was 128 1gb files, randomly read aligned 4K blocks. My computer has 32 gigabytes of ram and a 1TB NVMe SSD. The following 6 cases were tested:
- 8 Thread Mmap. (mmap_8)
- 32 Thread Mmap. (mmap_32)
- 512 thread Mmap. (mmap_512)
- 8 threads 8 concurrent
io_uring
. (uring_8) - 8 threads 32 concurrent
io_uring
. That is, 8 worker threads and 32 futures are simultaneously read. (uring_32) - 8 threads 512 concurrent
io_uring
. (uring_512)
Throughput (OP/S) and Latency (ns) are tested.
case | throughput | p50 | p90 | p999 | p9999 | max |
---|---|---|---|---|---|---|
uring_8 | 104085.77710777053 | 83166 | 109183 | 246416 | 3105883 | 14973666 |
uring_32 | 227097.61356918357 | 142869 | 212730 | 1111491 | 3321889 | 14336132 |
uring_512 | 212076.5160505447 | 1973421 | 3521119 | 19478348 | 25551700 | 35433481 |
mmap_8 | 109697.87025744558 | 78971 | 107021 | 204211 | 1787823 | 18522047 |
mmap_32 | 312829.53428971884 | 100336 | 178914 | 419955 | 4408214 | 55129932 |
mmap_512 | 235368.9890904751 | 2556429 | 3265266 | 15946744 | 50029659 | 156095218 |
Found Mmap sling IO_uring. Well, sure enough, the packaging isn’t very good, but it works just fine. Below is a heatmap of 1-minute latency. Each group of data is presented in the order of MMAP and IO_uring.
mmap_8 / uring_8
mmap_32 / uring_32
mmap_512 / uring_512
Some possible improvements
- It looks like right now
io_uring
The performance was not very good after my packaging with Tokio. Rust/C can then be compared in theio_uring
Performance on noP instructions to test the overhead introduced by the Tokio wrapper. - Test Direct I/O performance. Only Buffered I/O has been tested so far.
- Compare that to the Linux AIO. (Performance can’t be worse than Linux AIO (cry
- Use PERf to see where the bottleneck is now. At present
cargo flamegraph
Once it’s hooked upio_uring
Unable to allocate memory. Maybe there will be a sequel - Currently, users must guarantee
&mut buf
This is valid throughout the read cycle. If the Future is abort, there are memory leaks. See futures- RS for similar issuesGithub.com/rust-lang/f…. Tokio’s current I/O solves this problem by copying twice, first to the cache and then to the user. - Maybe you can wrap up writing files and other operations as well.
About the author:
Mr. Chi (Skyzh), a junior student at Shanghai Jiao Tong University, is a maintainer of SJTUG mirror station and addicted to writing Rust.
Content: February issue of Rust Chinese (Rust_Magazine)