The introduction
The standard I/O interface of traditional Linux operating systems is based on data copy operations, that is, I/O operations cause data to be transferred between buffers in the operating system kernel address space and those defined in the application address space. The biggest benefit of doing this is to reduce disk I/O operations, because if the requested data is already in the operating system’s cache, no actual physical DISK I/O operations need to be performed. However, the data copy operation in the process of data transmission causes a huge CPU cost, which limits the ability of the operating system to effectively carry out the data transmission operation.
Zero-copy is a technology that can significantly improve the performance of data transfer. It can reduce or eliminate unnecessary CPU copying operations when kernel drivers (such as network stacks or disk storage drivers) process I/O data. Modern CPU and storage architecture provides a lot of characteristics can effectively realize zero copy technique, but because the storage architecture is very complex, and the network protocol stack is sometimes need to the necessary processing of data, so the zero copy technique could be generating a lot of negative effects, even lead to zero copy completely lose the advantages of the technology itself.
Why is zero-copy technology needed
Today, many web servers are based on the client-server model. In this model, the client requests data or services from the server; The server needs to respond to the client’s request and provide the client with the data it needs. With the growing popularity of web services, applications such as video have grown rapidly. Today’s computer systems have sufficient capacity to handle the heavy load caused by such applications as video on the client side, but for the server side, it is insufficient to cope with the network traffic caused by such applications as video. Moreover, as the number of clients grows rapidly, the server side becomes more likely to become a performance bottleneck. Operating systems are often the culprit for performance bottlenecks on heavily loaded servers. For example, when a system call for a data “write” operation or a data “send” operation is issued, the operating system usually copies data from the buffer of the application address space into the buffer of the operating system kernel. The operating system has the advantage of a simple interface, but at a significant cost to system performance, because this data copy operation not only consumes CPU time slices, but also requires additional memory bandwidth.
In general, the client sends the request on the server via the network interface card, operating system, pass the client request to the server application, the server application will process the request, the request processing is complete, the operating system also need to deal with the result passed back through the network adapter.
The following section will give a brief introduction to how traditional servers transmit data and what problems may cause server performance loss
The data transfer process of a traditional server in Linux
The traditional I/O operation in Linux is buffered I/O. The data generated during I/O transfer usually requires multiple copies in the buffer. In general, when transferring data, the user application needs to allocate a buffer of appropriate size to hold the data to be transferred. The application reads a piece of data from a file and sends the piece of data to the receiver over the network. The user application only needs to invoke two system calls read() and write() to complete the data transfer operation, and the application is unaware of the data copy operation made by the operating system during the data transfer. For the Linux operating system, the kernel copies data for multiple times during data transmission due to data sorting and verification. In some cases, these data copy operations can significantly degrade data transfer performance.
When an application needs to access a piece of data, the operating system kernel checks whether the piece of data has been stored in the buffer of the operating system kernel address space because of the previous access to the same file. If the piece of data cannot be found in the buffer of the operating system kernel, the operating system kernel checks whether the piece of data has been stored in the buffer of the operating system kernel address space. The Linux operating system kernel first reads the data from disk into the operating system kernel buffer. If the read is done by DMA, the CPU doesn’t need to do much more than buffer management and create and process the DMA while it’s doing the read. Once the DMA is done, The operating system will be notified for further processing. The Linux operating system stores the piece of data into the requesting application’s address space based on the address specified in the read() system call. The operating system needs to copy the data again from the buffer of the user application address space into the kernel buffer associated with the network stack, which is also cpu-intensive. After the data copy operation is complete, the data is packaged and sent to the network interface card. During data transfer, the application can go back and perform other operations. Later, when a write() system call is invoked, the data contents of the user application buffer can be safely discarded or changed because the operating system has kept a copy of the data in the kernel buffer that can be discarded when the data is successfully delivered to the hardware.
As you can see from the above description, the data is copied at least four times during this traditional data transfer, and even if DMA is used to communicate with the hardware, the CPU still needs to access the data twice. When read() reads data, the data does not come directly from the hard disk, but must first pass through the operating system’s file system layer. When writing (), the data must be partitioned to match the size of the packet to be transmitted, the packet header must be considered in advance, and data checksums must be performed. Figure 1. Data transfer using read and write system calls traditionally
Overview of Zero Copy technology
What is zero copy?
Simply put, zero-copy is a technique that prevents the CPU from copying data from one block of storage to another. Zero-copy technologies for device drivers, file systems, and network protocol stacks in operating systems have greatly improved the performance of specific applications and enabled those applications to utilize system resources more efficiently. This performance improvement is achieved by allowing the CPU to perform other tasks while copying the data. Zero-copy technology can reduce the number of data copy and shared bus operations, and eliminate the number of unnecessary intermediate copies of transmitted data between memory, thus effectively improving the data transmission efficiency. Furthermore, zero-copy technology reduces the overhead of context switching between the user application address space and the operating system kernel address space. Doing a large number of data copies is a simple task, and from an operating system perspective, it would be a waste of resources if the CPU was constantly occupied to perform this simple task. If other, simpler system components can do this for you, freeing up the CPU to do other things, the utilization of system resources will be more efficient. In summary, the goals of zero-copy technology can be summarized as follows:
Avoiding data copying
- Avoid copying data between operating system kernel buffers.
- Avoid copying data between the operating system kernel and the user application address space.
- User applications can access the hardware storage directly, bypassing the operating system.
- Try to let DMA do the data transfer
Combine multiple operations
- Avoid unnecessary system calls and context switches.
- Data that needs to be copied can be cached first.
- Let the hardware handle the data as much as possible.
As mentioned earlier, zero-copy technology is very important for high-speed networks. This is because the network connectivity of high-speed networks is close to or even exceeds the processing power of the CPU. If this is the case, the CPU may spend almost all of its time copying the data to be transferred without the ability to do anything else, creating performance bottlenecks that limit communication rates and thus reduce the ability of network links. Typically, one CPU clock cycle can process one bit of data. For example, a 1 GHz processor can perform traditional data copy operations on a 1Gbit/s network connection, but on a 10 Gbit/s network, zero-copy technology becomes very important for the same processor. Zero-copy technology is used in clusters of supercomputers as well as in large commercial data centers for network links larger than 1 Gbit/s. However, as information technology advances and 1 Gbit/s, 10 Gbit/s, and 100 Gbit/s networks become more and more common, zero-copy technology will become more and more common as the processing power of network links increases much faster than the processing power of cpus. Traditional data copying is limited by traditional operating systems or communication protocols, which limits data transfer performance. Zero-copy technology reduces the number of data copies, simplifies the protocol processing layer, and provides faster data transmission between applications and networks. In this way, communication latency is effectively reduced and network throughput is improved. Zero copy technology is one of the main technologies to realize high-speed network interface of host or router.
Modern CPU and storage architectures provide many related functions to reduce or avoid unnecessary CPU copy operations during I/O operations, but this advantage of CPU and storage architectures is often overestimated. The complexity of the storage architecture and the data transfer required in the network protocols can cause problems, sometimes leading to the complete loss of the benefits of zero-copy technology. In the next chapter, we will introduce several zero-copy technologies emerging in Linux operating systems, briefly describe how they are implemented, and analyze their weaknesses.
Zero copy technology classification
The development of zero-copy technology is very diversified, and there are many kinds of zero-copy technology available. However, there is no zero-copy technology suitable for all scenarios. For Linux, there are many existing zero-copy technologies. Most of these zero-copy technologies exist in different Linux kernel versions. Some old technologies have been greatly developed between different Linux kernel versions or have been gradually replaced by new technologies. This article divides these zero-copy technologies into different scenarios for which they are applicable. In summary, the zero-copy technologies in Linux mainly include the following:
- Direct I/O: For this type of data transfer, applications have direct access to the hardware storage, and the operating system kernel only assists in data transfer: This type of zero-copy technique is intended for cases where the operating system kernel does not need to process the data directly, and the data can be transferred directly between the buffer of the application address space and disk, without the support of the page cache provided by the Linux operating system kernel.
- During data transfer, avoid copying data between the buffer of the operating system kernel address space and the buffer of the user application address space. Sometimes the application does not need to access the data while it is being transferred, so copying the data from Linux’s page cache into the user process’s buffer can be completely avoided and the transferred data can be processed in the page cache. In some special cases, this zero-copy technique can achieve better performance. Linux provides similar system calls to mmap(), sendFile (), and splice().
- Optimize the transfer of data between the Linux page cache and the user process buffer. The zero-copy technique focuses on flexible copying of data between the user process buffer and the operating system’s page cache. This method continues the traditional way of communication, but is more flexible. In Linux, this approach mainly utilizes copy-on-write technology.
The purpose of the first two types of methods is to avoid buffer copying between the application address space and the operating system kernel address space. These two types of zero-copy techniques are usually suitable for special situations, such as transferring data that does not need to be processed by the operating system kernel or by applications. The third method inherits the traditional concept of data transmission between the application address space and the operating system kernel address space, and then optimizes the data transmission itself. We know that data transfer between hardware and software can be done using DMA, which almost doesn’t involve the CPU, freeing up the CPU to do a lot of other things. But when data needs to be transferred between the buffer of the user’s address space and the page cache of the Linux kernel, there is no such thing as DMA, and the CPU needs to be fully involved in copying the data. So the purpose of this third type of method is to effectively improve the efficiency of data transfer between the user address space and the operating system kernel address space.