I understand zero copy – original link

Zero Copy has long heard about the optimization of I/O operations on Linux. Today, WE are going to take a look at this problem from the top down (from the application layer to the bottom layer, of course, without involving the details of the kernel).

What is zero copy

To better describe Zero Copy, this article begins with the simple process of a network server delivering data stored in a server file to a client over the network. The whole process was mainly network I/O, the data was copied at least 4 times, and many user/kernel context switches were almost performed. As shown in the figure below, the following four steps are taken:

Step 1: The operating system makes a read system call to read the contents of a file on disk and store it in the kernel address space buffer.

Step 2: Copy the data from the kernel buffer to the user buffer, and the read system call returns. The return of the call causes a context switch from the kernel back to user mode, and now that the data is stored in the user address space buffer, it can start moving down again.

Step 3: The write system call causes a context switch from user mode to kernel mode, and a third copy is performed, putting the data into the kernel address space buffer again. But this time, the data is put into a different buffer, which is associated with the socket.

Step 4: Write the system call return to create a fourth context switch. The data is written into the network I/O, and the operation logic of the server during network transmission ends here.

As we can see from the figure above, the data was copied as many as 4 times during the whole network transmission process, and there were many switches from user mode to kernel mode. Is it possible to reduce the number of data copies and improve network I/O efficiency? The answer is yes.

So what exactly is zero copy? It is used to copy data directly from the kernel-mode buffer to the Socket buffer, without passing through the user-mode buffer. The reason why it is called zero-copy is relative to user-mode. As shown below:

In general, there is zero copy from the operating system’s point of view, because the data is not copied between kernel buffers. When using zero copy, there are other performance benefits in addition to replication avoidance, such as fewer context switches, less CPU data cache contamination, and no CPU checksum calculation.

Zero copy Java implementation

The FileChannel in NIO has two methods, transferTo and transferFrom, that directly copy data from the FileChannel to another Channel. Or copy data from another Channel directly to a FileChannel. This interface is often used for efficient network/file data transfer and large file copy. With the support of the operating system, data transmission by this method does not need to copy the source data from the kernel state to the user state, and then from the user state to the target channel kernel state, and avoids two context switches between the user state and the kernel state, that is, the use of “zero copy”.

/** * disk-nic zero copy */
class ZeroCopyServer {
    ServerSocketChannel listener = null;

    public static void main(String[] args) {
        ZerocopyServer dns = new ZerocopyServer();
        dns.mySetup();
        dns.readData();
    }

    protected void mySetup(a) {
        InetSocketAddress listenAddr = new InetSocketAddress(9026);

        try {
            listener = ServerSocketChannel.open();
            ServerSocket ss = listener.socket();
            ss.setReuseAddress(true);
            ss.bind(listenAddr);
            System.out.println("Listening port :" + listenAddr.toString());
        } catch (IOException e) {
            System.out.println("Port binding failed:" + listenAddr.toString() + Port may already be in use, cause of error:+ e.getMessage()); e.printStackTrace(); }}private void readData(a) {
        ByteBuffer dst = ByteBuffer.allocate(4096);
        try {
            while (true) {
                SocketChannel conn = listener.accept();
                System.out.println("Connection created:" + conn);
                conn.configureBlocking(true);
                int nread = 0;
                while(nread ! = -1) {
                    try {
                        nread = conn.read(dst);
                    } catch (IOException e) {
                        e.printStackTrace();
                        nread = -1; } dst.rewind(); }}}catch(IOException e) { e.printStackTrace(); }}}Copy the code

A little digression

For the OPTIMIZATION of I/O operations can also refer to zero copy ideas to optimize our system, recently learned that Kafka can be able to carry high throughput with its strong dependence on the underlying operating system page cache has a lot to do with it, so in kafka is not the JVM memory the larger the better, With zero copy reduction of data between kernel and user mode copy, context switch operation is similar to kafka.

Kafka

To compensate for this performance difference, modern operating systems increasingly focus on using memory to cache disks. Modern operating systems actively use all free memory for disk caching at the cost of reduced performance during memory reclamation. All reads and writes to the disk pass through this unified cache. This feature cannot be easily turned off without direct I/O. So even if the process maintains an in-process cache, the data may be copied to the operating system’s Pagecache, where virtually everything is stored in two copies.

In addition, Kafka is built on top of the JVM, and anyone who knows anything about Java memory usage knows two things:

  • The memory overhead of objects is very high, often twice (or more) the amount of data being stored.
  • As the amount of data in the heap increases, Java garbage collection becomes increasingly complex and slow.

Because of these factors, using file systems and Pagecache is more advantageous than maintaining in-memory caches or other structures — we can at least double the size of the available cache by automatically accessing all free memory, and by storing compact byte structures instead of individual objects. It is expected to double the cache capacity again. This allows a 32GB machine cache capacity of 28-30GB without any additional GC burden. In addition, the cache is still available even if the service is restarted, whereas the in-process cache needs to be rebuilt in memory (rebuilding a 10GB cache can take 10 minutes), or the process starts from the cold Cache state (which means that the process initially performs poorly). This also greatly simplifies the code because all the logic to keep the cache and file system consistent is now in the OS, making it more accurate and efficient than one-time in-process caching. If your disk usage prefers sequential reads, read-ahead effectively pre-fills the cache with useful data read from the disk each time.

A very simple design is presented here: Instead of maintaining as much in-memory cache as possible and rushing to flush data to the file system when space is low, we reverse the process. All data is written to the persistent log of the file system from the start, rather than flushed to disk when cache space is low. In effect, this indicates that the data has been moved to the kernel’s Pagecache.

About the user state, kernel state

As shown in the figure above, from a macro perspective, the operating system architecture is divided into user mode and kernel mode. A kernel is essentially software – controlling the hardware resources of a computer and providing the environment in which upper-layer applications run. User mode is the activity space of the upper application program. The execution of the application program must rely on the resources provided by the kernel, including CPU resources, storage resources, I/O resources, etc. In order for upper-layer applications to access these resources, the kernel must provide an interface for upper-layer applications to access them: system calls.

Refer to the link

Wikipedia – Zero copy Linux Zero copy principle

Pay attention to our