A thorough understanding of the zero copy principle

This is the 12th day of my participation in the August More Text Challenge. For details, see:August is more challenging

Zero copy

Zero-copy is an I/O optimization technique for moving data quickly and efficiently from a file system to a network interface without having to Copy it from kernel space to user space. It can significantly improve performance over protocols such as FTP or HTTP. It should be noted, however, that not all operating systems support this feature, and it is currently only available when NIO and Epoll transfers are used.

Note that it cannot be used on file systems that have encrypted or compressed data, only the original content of the transferred file. This raw content also includes encrypted file content.

Performance issues with traditional I/O operations

If the server wants to provide file transfer, the simplest way we can think of is to read the file from the disk and send it to the client over a network protocol.

The way traditional I/O works is that data reads and writes are copied back and forth from user space to kernel space, where data is read or written from disk via an OPERATING system-level I/O interface.

The code usually looks like this, usually requiring two system calls:

read(file, tmp_buf, len);
write(socket, tmp_buf, len);
Copy the code

The code is simple, it’s just two lines of code, but there’s a lot going on.

First of all, there are four context switches between the user mode and the kernel mode. Because there are two system calls, one is read() and the other is write(), each system call has to be switched from the user mode to the kernel mode first, and then from the kernel mode to the user mode after the kernel completes its task.

The cost of context switch is not small. A switch takes tens of nanoseconds to several microseconds. Although the time seems very short, such time is easily accumulated and amplified in high concurrency scenarios, thus affecting the performance of the system.

Secondly, there are four data copies, two DMA copies and two CPU copies, which are described below:

First copy, copies the data on disk to the buffer of the operating system kernel. This copy process is carried by DMA.
Second copyTo copy the data from the kernel buffer to the user buffer, so that our application can use the data. This copying process is done by the CPU.
Third copyCopy the data from the user buffer to the kernel socket buffer. This process is still carried by the CPU.
Fourth copyThe data in the kernel socket buffer is copied to the network card buffer. This process is carried by the DMA.

This simple and traditional file transfer mode, with redundant switchover and data copy, is very bad in a high concurrency system, with a lot of unnecessary overhead, which will seriously affect the system performance.

Therefore, to improve the performance of file transfer, you need to reduce the number of user-mode and kernel-mode context switches and memory copies.

Principle of zero copy technology

Zero copy is used to solve the problem that the operating system frequently replicates data when processing I/O operations. The main technologies for zero copy include mmap+write, sendfile and Splice.

Virtual memory

Before you get into zero-copy technology, you need to understand the concept of virtual memory.

All modern operating systems use virtual memory, and using virtual addresses instead of physical addresses has the following main benefits:

Multiple virtual memories can point to the same physical address.
The virtual memory space can be much larger than the physical memory space.

The first feature above can be optimized to map virtual addresses in kernel space and user space to the same physical address so that I/O operations do not need to be copied back and forth.

The following figure shows how virtual memory works.

Mmap/write mode

Using mMAP /write instead of traditional I/O takes advantage of virtual memory features. The following diagram shows how mmap/write works:

The core difference is that after reading data into the kernel Buffer, the application directly copies the data from the kernel Read Buffer to the Socket Buffer for writing. This replication between cores also requires the PARTICIPATION of the CPU.

The above process is the loss of a CPU COPY, improve the I/O speed. However, there is no reduction in the number of context switches found four times, because the application is still required to initiate a write operation.

What about reducing context switching? This requires sendFile to be further optimized.

Sendfile way

Starting with Linux 2.1, Linux introduced Sendfile to simplify operations. Sendfile can be used instead of mmap/write for further optimization.

Sendfile does the following:

  mmap();
  write();
Copy the code

Replace with:

 sendfile();
Copy the code

This reduces the context switch because one less application initiates the write operation and instead initiates the sendFile operation.

The following diagram shows how Sendfile works:

Sendfile has only three data replicates (including one CPU COPY) and two context switches.

Can we reduce CPU COPY to zero? This requires sendfile with Scatter/Gather.

Sendfile with Scatter/Gather

The Linux 2.4 kernel has been optimized to provide sendFile with Scatter/Gather, which removes the last CPU COPY. In kernel space, the Read BUffer and Socket BUffer do not copy data. Instead, the memory address and offset of the Read BUffer are recorded to the corresponding Socket BUffer. In this way, no replication is required. Its essence and virtual memory solution thought is the same, is the memory address of the record.

The following figure shows how Scatter/Gather’s Sendfile works:

Scatter/Gather’s sendfile has only two data copies (both DMA copies) and two context switches. There is no CUP COPY at all. However, this collection copy function is required to support hardware and driver.

Splice way

Splice calls are very similar to sendfile in that the user application must have two open file descriptors, one for the input device and one for the output device. Unlike Sendfile, Splice allows any two files to connect to each other, rather than just transferring data between the file and the socket. For the special case of sending data from a file descriptor to a socket, the sendfile system call has always been used, and Splice has always been a mechanism that is not limited to sendfile functionality. That is, sendFile is a subset of Splice.

In Linux 2.6.17, Splice was introduced. In Linux 2.6.23, the sendfile implementation is gone, but the API and its functions are still there, using splice to implement the API and functions.

Unlike Sendfile, Splice doesn’t require hardware support.

conclusion

Both traditional I/O and the introduction of zero copy, two DMA copies are essential. Because both times DMA is hardware dependent. So, the so-called zero copy, is to reduce CPU copy and reduce context switching.

The following graph shows a comparison of the various zero-copy technologies:

	CPU copy	The DMA copy	The system calls	Context switch
The traditional method	2	2	read/write	4
The memory mapping	1	2	mmap/write	4
sendfile	1	2	sendfile	2
scatter/gather copy	0	2	sendfile	2
splice	0	2	splice	0

At the end

I am a code is being hit is still trying to advance. If the article is helpful to you, remember to like, follow yo, thank you!