Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

1. What is zero copy

Zero copy literally means two things, “zero” and “copy” :

  • Copy: Data is transferred from one storage area to another storage area.
  • Zero: indicates that the number of times is 0, which indicates that the number of times of copying data is 0.

Taken together, zero-copy means that data does not need to be copied from one storage area to another.

Zero-copy means that the CPU does not need to copy data from one storage area to another when performing I/O operations on a computer, thus reducing context switching and CPU copy time. It is an I/O operation optimization technique

2. Traditional I/O execution process

Do server-side development partners, file download function should be implemented a lot of it. If you are implementing a Web application and the front-end request comes in, the server’s job is to send the files on the server’s host disk from the socket that has been connected. The key implementation code is as follows:

while((n = read(diskfd, buf, BUF_SIZE)) > 0)
    write(sockfd, buf , n);
Copy the code

Traditional IO processes include read and write processes.

  • read: Reads data from disk into the kernel buffer and copies it to the user buffer
  • write: Writes data to the socket buffer first and then to the nic device.

  • The user application process invokes the read function to make IO calls to the operating system, changing the context from user mode to kernel mode (switch 1)
  • DMA controls read data from disk into the kernel buffer
  • The CPU copies the kernel buffer data to the user application cache, the context changes from kernel to user state (switch 2), and the read function returns
  • The user application process initiates IO calls through write function, and the context changes from user mode to kernel mode (switch 3)
  • The CPU copies data from the user buffer to the socket buffer
  • The CPU controller copies the data from the socket buffer to the nic device, switches the context from kernel mode back to user mode (switch 4), and the write function returns

As can be seen from the flowchart, the traditional I/O read and write process includes four context switches (four user mode and kernel mode switches), four data copies (two CPU copies and two DMA copies). What is a DMA copy? Let’s review the operating system knowledge involved in zero copy.

3. Zero copy related knowledge points

3.1 Kernel space and user space

The application program running on our computer actually needs to go through the operating system to do some special operations, such as disk file reading and writing, memory reading and writing. Because these operations are relatively dangerous operations, can not be messed up by the application, only to the underlying operating system. Therefore, the operating system allocates memory space for each process, which is partly user space and partly kernel space, which is the area accessed by the operating system kernel and is the protected memory space, while user space is the area of memory accessed by user applications. On a 32-bit operating system, for example, it allocates 4 gigabytes (2^32) of memory for each process.

  • Kernel space: mainly provides functions such as process scheduling, memory allocation, and connecting hardware resources
  • Userspace: space provided to an application process that does not have access to kernel-space resources. If an application process needs to use kernel-space resources, it needs to make a system call to do so. The process is switched from user space to kernel space, and then from kernel space to user space.

3.2 What are user mode and kernel mode

  • Processes run in kernel space and are referred to as the kernel state of a process
  • Processes run in user space and are referred to as the user state of the process

3.3 What is Context Switching

  • What is CPU context?

A CPU register is a small but fast memory built into the CPU. The program counter, on the other hand, is used to store the location of the instruction that the CPU is executing, or the location of the next instruction that will be executed. These are the environments that the CPU must depend on before running any task, hence the name CPU context

  • What is CPU context switching?

The CPU context of the previous task (i.e., the CPU registers and program counters) is saved, the context of the new task is loaded into these registers and program counters, and the new task is skipped to the new location indicated by the program counter.

Context switching refers to switching processes or threads on the CPU by the kernel (the core of the operating system). The process transition from user mode to kernel mode needs to be completed by system call. During the system call process, the CPU context will be switched.

The original user mode instruction location in the CPU register needs to be saved first. Then, in order to execute the kernel code, the CPU register needs to be updated to the new location of the kernel instruction. The last step is to jump to kernel mode and run the kernel task.

3.4 Virtual Memory

Modern operating systems use virtual systems, that is, virtual addresses instead of physical addresses. Using virtual memory can have two benefits:

  • Virtual memory space can be much larger than physical memory space
  • Multiple virtual memories can point to the same physical address

Multiple virtual memory can point to the same physical address. You can map the virtual address of kernel space and user space to the same physical address. In this way, you can reduce the number of IO data copies, as shown in the diagram below:

3.5 the DMA technology

DMA (Direct Memory Access) DMA is essentially a stand-alone chip on a motherboard that allows direct IO data transfer between peripherals and memory storage, without CPU involvement. Let’s take a look at the IO process, what DMA does.

  • The user application process invokes the read function and makes an IO call to the operating system, which blocks and waits for data to return.
  • After the CPU receives the instruction, it initiates instruction scheduling to the DMA controller.
  • After the DMA receives the IO request, it sends the request to disk
  • The disk puts the data into the disk control buffer and notifies the DMA
  • DMA copies data from the disk control buffer to the kernel buffer
  • The DMA signals the CPU to read the data and transfers the work to the CPU, which copies the data from the kernel buffer to the user buffer
  • The user application process switches from the kernel state to the user state

As you can see, DMA does a very clear job of helping the CPU forward IO requests and copy data. Why is it needed?

The main is efficiency, it helps the CPU to do things, at this time, the CPU can be free to do other things, improve CPU utilization.

4. Several ways to implement zero copy

Zero copy does not mean no copy, but reduces the number of user/kernel mode switches and CPU copy times. Zero copy implementation can be implemented in several ways, respectively

  • mmap+write
  • senfile
  • Senfile with DMA phone copy function

4.1 MMAP + Write Zero copy

The mMAP function prototype is as follows:

void* mmap(void *addr,size_t length,int prot,int flags,int fd,off_t offset);
Copy the code
  • Addr: specifies the address of the mapped virtual memory
  • Length: indicates the length of the mapping
  • Prot: indicates the protection mode of mapped memory
  • Flags: Specifies the type of mapping
  • Fd: file handle for mapping
  • Offset: indicates the offset of the file

In the previous section, zero copy was reviewed and virtual memory was introduced. You can map virtual addresses of kernel space and user space to the same physical address without reducing the number of data copies! Mmap takes advantage of this virtual memory feature by mapping the read buffer in the kernel to the buffer in user space. All IO is done in the kernel. The zero-copy process of mMAP + Write is as follows:

  • User process passesMmap methodMake an IO call to the operating system kernel,The context is switched from user to kernel mode.
  • The CPU uses a DMA controller to copy data from the hard disk to the kernel buffer
  • The context switches from kernel to user mode, and the MMAP method returns
  • User process passeswriteMethod makes an IO call to the operating system kernel,The context is switched from user to kernel mode.
  • The CPU copies data from the kernel buffer to the socket buffer
  • COU uses the DMA controller to copy data from the socket buffer to the nic. The context changes from kernel mode to user mode, and the write method returns.

As you can see, the mMAP +write implementation has zero copies, I/O has four context switches between user space and kernel space, and three data copies, including two DMA copies and one CPU copy. Mmap maps the address of the read buffer to the address of the user buffer, the kernel buffer and the application buffer are shared, all saving a CPU copy, and the user process memory is virtual, only to map to the kernel read buffer, can save half of the memory space.

4.2 Zero copy of senFile implementation

Sendfile is a system call function introduced after the Linux2.1 kernel version. The API is as follows:

ssize_t sendfile(int out_fd,int in_fd,off_t *offset,size_t count)
Copy the code
  • Out_fd: indicates the file descriptor of the content to be written, which is a socket descriptor
  • In_fd: indicates the file descriptor of the content to be read. It must be a real file, not a socket or a pipe
  • Offset: specifies the starting position of the read file. If NULL, it indicates the default starting position of the file.
  • Count: Specifies the number of bytes transferred between FDout and fdin

Sendfile represents data transfer between two file descriptors and operates within the operating system kernel. It avoids copying data from the kernel buffer to the user buffer, so it can be used to achieve zero copy. The zero-copy process of SendFile is as follows:

  1. The user process initiates the sendFile system call, and the context changes from user to kernel mode
  2. DMA controller that copies data from disk to kernel buffer
  3. The CPU copies data from the read buffer to the socket buffer
  4. DMA controller that asynchronously copies data from the socket buffer to the nic
  5. The context switches from kernel to user mode, and the sendfile call returns

You can see zero copies of sendFile implementation, 2 context switches between user space and kernel space for I/O, and 3 data copies including 2 DMA copies and 1 CPU copy. Can we reduce the NUMBER of CPU copies to zero? That is SenFile with DMA collect copy capability.

4.3 SendFile +DMA Scatter/Gather Zero copy

After Linux 2.4, yessendfileSg-dma has been optimized and upgraded to include A DMA copyscatter/gatherOperation, which reads data directly from the kernel space buffer to the nic. Use this feature to make zero copy, that is, can alsoSave one more CPU copy

The zero-copy process of SendFile +DMA Scatter/Gather is as follows:

  1. The user process initiates the sendFile system call, and the context (switch 1) switches from user to kernel mode
  2. DMA controller that copies data from disk to kernel buffer
  3. The CPU sends the file descriptor information in the kernel buffer (including the memory address and offset of the kernel buffer) to the socket buffer
  4. The DMA controller copies data directly from the kernel buffer to the nic based on the file descriptor information
  5. Context switch (switch 2) Switch from kernel mode to user mode, sendfile call returns

It can be found that when SendFile +DMA Scatter/Gather realizes zero copy, I/O has two context switches between user space and kernel space, and two data copies, of which two data copies are DMA copies. This is true zero-copy technology. No data is moved through the CPU at all times, all data is transferred via DMA.

5. Java zero-copy mode

  • Java NIO support for MMAP
  • Java NIO support for SendFile

5.1 Java NIO support for MMAP

Java NIO has a MappedByteBuffer class that can be used to implement memory mapping. Underneath it is an API that calls the Linux kernel’s Mmap.

5.1.2 Off-heap memory

Data to be sent in the JVM heap space into a kernel buffer, often need to copy data from the JVM heap space to system memory (an unofficial understanding, using C language implementation of local method calls, the first thing you need to copy data from the heap space storage structure) that are related to the C language, so the measures to improve the performance of: use outside the heap memory. However, the data in off-heap memory usually needs to be retrieved from the heap space, so from this point of view, the performance improvement seems limited.

5.1.2 Memory Mapping (Mmap + Write)

As can be seen from 4.1 Analysis of MMAP + Write memory mapping, the core idea of memory mapping is to map the kernel buffer and user-space buffer to the same physical address, which can reduce data copy between the user buffer and kernel buffer. But it does not reduce the number of context switches.

5.1.2 example

public class MmapTest {

    public static void main(String[] args) {
   
        // Allocate out-of-heap memory
        // ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1024);
        try {

            FileChannel readChannel = FileChannel.open(Paths.get("./cscw.txt"), StandardOpenOption.READ);
            FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
            MappedByteBuffer data = readChannel.map(FileChannel.MapMode.READ_ONLY, 0.1024 * 1024 * 40);
            // Data transfer
            writeChannel.write(data);
            readChannel.close();
            writeChannel.close();

        } catch(Exception e) { System.out.println(e.getMessage()); }}}Copy the code

5.2 Java NIO support for SendFile

FileChannel transferTo()/transferFrom(), the underlying sendFile () system call function. Kafka this open source project is to use it, org.apache.kafka.common.net work. PlaintextTransportLayer. TransferFrom () method, as follows:

public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException {
    return fileChannel.transferTo(position, count, socketChannel);
}
Copy the code

Sendfile has the following demo:

public class SendFileTest {
    public static void main(String[] args) {
        try {
            FileChannel readChannel = FileChannel.open(Paths.get("./jay.txt"), StandardOpenOption.READ);
            long len = readChannel.size();
            long position = readChannel.position();
            
            FileChannel writeChannel = FileChannel.open(Paths.get("./siting.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE);
            // Data transfer
            readChannel.transferTo(position, len, writeChannel);
            readChannel.close();
            writeChannel.close();
        } catch(Exception e) { System.out.println(e.getMessage()); }}}Copy the code

Zero copy Linux zero copy mechanism and FileChannel