Definition:
The CPU does not have to first copy data from one memory region to another as the computer performs an operation, reducing context switching and CPU copy time.
Benefits:
The zero-copy mechanism reduces the number of repeated I/O copies of data between the kernel buffer and user process buffer. The zero-copy mechanism reduces CPU overhead due to context switching between the user process address space and the kernel address space.
Traditional I/O mode:
Zero copy mode:
User mode direct I/O:
Applications can access the hardware storage directly, and the operating system kernel only assists in data transfer. In this way, there is still a context switch between user space and kernel space, and the data on the hardware is copied directly to user space, not through the kernel space. Therefore, there is no copy of data between the kernel-space buffer and user-space buffer for direct I/O.
mmap + write
Mmap is a memory-mapped file method provided by Linux. A virtual address in the address space of a process is mapped to a disk file address
- The user process makes a system call to the kernel through the mmap() function, switching the context from user to kernel mode.
2. Map the memory address between the read buffer of the user process kernel space and the cache of the user space.
-
The CPU uses DMA controllers to copy data from main memory or hard disk to read buffers in kernel space.
-
The context switches from kernel state to user state, and the MMAP system call execution returns.
-
The user process makes a system call to the kernel via the write() function, switching the context from user to kernel mode.
-
The network buffer to which the CPU copies data from the read buffer.
-
The CPU uses the DMA controller to copy data from the network buffer to the network card for data transfer.
-
The context switches from kernel state back to user state, and the write system call execution returns.
sendfile
With the SendFile system call, data can be directly I/O transferred within the kernel space, eliminating the need to copy data back and forth between user space and kernel space. Unlike mMAP memory mapping, THE I/O data in the SendFile call is completely invisible to user space. In other words, this is a complete data transfer process.
- The user process makes a system call to the kernel via sendFile (), and the context changes from user mode to kernel mode.
- The CPU uses DMA controllers to copy data from main memory or hard disk to read buffers in kernel space.
- The network buffer to which the CPU copies data from the read buffer.
- The CPU uses the DMA controller to copy data from the network buffer to the network card for data transfer.
- The context switches from kernel mode back to user mode, and sendFile system call execution returns.
Reduce the number of data copies:
In the process of data transmission, avoid the CPU copy between the user space buffer and the system kernel space buffer, as well as the CPU copy in the system kernel space, which is the current mainstream zero-copy technology implementation idea.
Copy-on-write technology: When multiple processes share the same piece of data, if one process needs to modify the data, it copies the data to its own process address space. If the data is only read, no copy operation is required.
RocketMQ versus Kafka
RocketMQ uses mMAP + Write as a zero-copy approach for data persistence and transmission of small chunks of business-level messages.
Kafka uses zero-copy sendFile, which is suitable for data persistence and transmission of large files with high throughput, such as system log messages. It’s worth noting, however, that Kafka uses mmap + write for index files and sendFile for data files (via Java’s Filechannel.transferto method).