Wechat official account: Moon chat technology

This article is about 3500 words, and it will take you about 10 minutes to read it completely.

preface

NETTY’s high concurrency is largely due to NIO, and the core of NIO is zero copy technology, today let you ten minutes to play zero copy.

What is the traditional IO model?

Let’s take a look at a diagram to see what it takes to transfer a file from disk to a network card:

  • Step 1: Copy the file from disk to the kernel buffer through DMA technology

  • Step 2: Copy the file from the kernel buffer to the user process buffer domain

  • Step 3: Copy the file from the user process buffer to the socket buffer

  • Step 4: Copy the files in the socket buffer to the network adapter through DMA technology

This whole area of data storage is called an indirect buffer.

We found that there are actually four steps to copy data!! And the entire data transmission process is required to perform the CPU.

This process is too tedious. I just want to transfer some data. Why do I need to send it to the user?

How to optimize the traditional IO process?

Let’s continue with the flow chart and see what steps can be removed

We found that the entire time the data was read from the disk and sent to the network card, the file content was unchanged, but I had to go through 4 copies of the file content before I could actually transfer the file to the network card.

In the simplest way, is it possible to transfer data from the disk directly to the network card?

Of course not, the reason is very simple, because the network card and disk are external devices, so there must be an intermediate buffer area to fetch the stored data, do a forwarding function.

So if we look at the figure above, there are two areas that can be buffered, one is the socket buffer, the other is the kernel buffer, so which one should be used?

This problem should be a good choice, socket certainly can not, socket and my operating system is not melon, so only use the kernel buffer to do the buffer.

Is it possible to send data directly to the network card through the kernel buffer?

It looks like that. So let’s see, what does a socket buffer do?

Function of the socket buffer

After each socket is created, two buffers are allocated, an input buffer and an output buffer.

Write ()/send() does not transmit data to the network immediately. Instead, the data is written to a buffer and then sent from the buffer to the target machine by TCP. Once the data is written to the buffer, the functions return successfully, whether or not they reach the target machine or when they are sent to the network, which is the responsibility of the TCP protocol.

So sockets are used to transmit network data, it seems that it is not enough.

But let’s think about it another way. Do we just need to tell the socket what data to send? Then the contents of the file can be used directly in the kernel buffer.

How does zero copy improve performance

Zero copy is the use of memory mapping to eliminate the number of data copies, and then DMA technology to reduce CPU time.

Just from the performance of the copy times, we can say that the performance improvement is at least 50 percent.

DMA

DMA is an important word that has been mentioned a lot in the previous article. DMA is a large part of the zero-copy process and can help the CPU do a lot of work. Let’s introduce this amazing technology.

DMA (Direct Memory Access) is an important feature of all modern computers. It allows hardware devices of different speeds to communicate without relying on the heavy interrupt load of the CPU. Otherwise, the CPU needs to copy each fragment of data from the source to the register, and then write them back to the new place again. During this time, the CPU becomes unavailable for other work.

How it works: DMA transfers copy data from one address space to another. When the CPU initializes the transfer action, the transfer action itself is performed and completed by the DMA controller.

Zero copy overall flowchart

You already have a good idea of what a zero copy is, but what is NIO? Since it took you ten minutes to understand NIO and zero copy, NIO is essential.

Why do you need NIO?

All system I/O is divided into two phases:

  • 1. Waiting is ready
  • 2. Read and write operations

It is important to note that waiting for a ready block does not use the CPU, it is “waiting”; The real blocking of read and write operations is the use of CPU, really “work”, and this process is very fast, belongs to memory copy, bandwidth is usually more than 1GB/s level, can be understood as basically no time consuming.

So let’s start with what does traditional IO do

In traditional socket IO, you need to create one thread per connection.

One thread corresponds to one connection and only handles one connection. This is the traditional socket IO.

When the number of concurrent connections is very large, the stack memory occupied by threads and the overhead of CPU thread switching can be very large.

It is also possible in this scenario that the number of threads is smaller than the number of connections, so each thread cannot block while performing I/O operations, and if blocked, some connections will not be processed.

As shown in the figure above, suppose there are three threads managing three connections. If a fourth task is inserted, you have to wait for the previous task to complete.

It operates like a pipeline, and it’s sequentially blocked, so traditional IO is also called BIO.

Traditional IO doesn’t know when to process data, so it just waits.

To solve these problems, NIO came into being.

How does NIO solve these problems?

Let’s start by introducing the core components of NIO

  • Channel (channel)
    • A channel represents a connection to an entity, be it a file, a network socket, etc. In other words, channels are the bridge that Java NIO provides for our programs to interact with the underlying I/O services of the operating system
  • Buffer

  • You can think of it as a place to store data, and there are three properties that are important to buffer

Capacity (total capacity),position (current position of pointer),limit (read/write boundary position)

  • (selectors)
    • A selector is a special component that collects the state (or events) of each channel. After registering the channels with the selector and setting the events we care about, we can quietly wait for the events to occur by calling the select() method.

The channel has the following four events that we can listen to:

Accept: There are acceptable connections

Connect: The connection is successful

Read: Data can be Read

Write: Data can be written

We first need to register the handler when these events arrive. Then, at the appropriate time, tell the event selector: I am interested in this event.

In other words, the handler that registers the four events on the selector is used to process the events of the channel. When an event in the channel is ready to proceed to the next step, it tells the server to process the corresponding data and assigns the corresponding task to the server. In this way, the CPU resources can be better utilized.

The zero copy we mentioned earlier is what happens when the data is processed.

What’s the difference between NIO and IO?

  • NIO processes data in the form of buffers (blocks). IO writes and reads data in the form of streams.
  • 2. Based on this form of stream, NIO uses the form of channel and buffer to process data
  • 3. Another point is that NIO channels can be bidirectional, whereas IO streams can only be unidirectional
  • 4. NIO buffer can also be fragmented, you can create read-only buffer, direct buffer and indirect buffer, direct buffer is to speed up I/O, and in a special way to allocate its memory buffer
  • 5. Different read and write triggering modes. NIO is triggered by the polling mechanism of the selector, and IO is triggered when the information is received.

conclusion

From the traditional IO model to the NIO zero-copy model, we can see that the emergence and rise of a new technology must be because it can meet the needs that the previous technology cannot meet, or the performance of the previous technology has a high improvement.

Traditional I/O transmission requires four copies of data content, including switching between kernel mode and user mode, and switching between kernel mode and data carrier (disk and network adapter). The whole process is blocked and wastes a lot of resources.

NIO, on the other hand, uses core modules such as selectors and channels to make the entire I/O process asynchronous. Only when the data task is really ready, will the CPU do the processing, which saves a lot of resources and improves performance.

Zero copy is to make the data between the user mode and the kernel mode no longer through the copy transmission, the use of memory mapping, so that the kernel mode and the user mode data zero copy.

The copy method uses DMA technology, its purpose is to solve the way the CPU copy data, so that copy data this kind of tiring work no longer occupy CPU resources, there is DMA to complete.

Because of the memory mapping, the zero-copy technique cannot change the data content.