I used to ask myself, do I really understand JAVA IO and NIO? It looks simple, and it’s used a lot, but it’s forgotten.

JAVA IO refers to JAVA programs that handle input and output, and these days it usually refers to file and network IO. NIO is a new IO API available after JDK 1.4.

Java IO and NIO seem like simple things, but they just get stuck in their throat.

This article tries to explain JAVA IO and NIO from the aspects of underlying source code, theoretical diagrams and model thinking.

Java IO

Java IO packages were provided prior to JDK 1.4.

Design features of the Java IO API

IO package provides a number of apis, including Stream and Reader, Writer, etc. These fall into two categories, depending on the information delivered to the programmer:

Byte streams include InputStream and OutputStream.
- InputStream is readable, provides the read function, specific implementation has a FileInputStream, SocketInputStream and so on
- The OutputStream to write, to provide the write function, specific implementation has a FileInputStream, SocketInputStream and so on
Character streams include Reader and Writer.
- A Reader is a stream of readable characters that provides read functions, such as reading input from a shell terminal, such as FileReader
- Writer is a writable character stream that provides write functions, such as Printer and FileWriter

The underlying principles of the Java IO API

The following example writes what the data source reads to the destination array.

FileInputStream fis = new FileInputStream(source);
FileOutputStream fos = new FileOutputStream(des);
byte[] bytes = new byte[1024 * 1024];
int len;
while((len = fis.read(bytes)) ! = -1) {
    fos.write(bytes, 0, len);
}
Copy the code

Fis is a file input stream
Fos is the file output stream
When the file stream is finished,fis.readReturn 1

FileInputStream’s read function calls:

private native int readBytes(byte b[], int off, int len) throws IOException;
Copy the code

This is a native function, so you can’t see the implementation from the Java code. See the openJDK c source code to see the readBytes implementation:

jint readBytes(JNIEnv *env, jobject this, jbyteArray bytes,  jint off, jint len, jfieldID fid)
{
    jint nread;
    char stackBuf[BUF_SIZE];
    char *buf = NULL;
    FD fd;
    if (IS_NULL(bytes)) {
        JNU_ThrowNullPointerException(env, NULL);
        return - 1;
    }
    if (outOfBounds(env, off, len, bytes)) {
        JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException".NULL);
        return - 1;
    }
    if (len == 0) {
        return 0;
    } else if (len > BUF_SIZE) {
        buf = malloc(len);
        if (buf == NULL) {
            JNU_ThrowOutOfMemoryError(env, NULL);
            return 0; }}else {
        buf = stackBuf;
    }
    fd = getFD(env, this, fid);
    if (fd == - 1) {
        JNU_ThrowIOException(env, "Stream Closed");
        nread = - 1;
    } else {
        nread = IO_Read(fd, buf, len);
        if (nread > 0) {
            (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
        } else if (nread == - 1) {
            JNU_ThrowIOExceptionWithLastError(env, "Read error");
        } else { /* EOF */
            nread = - 1; }}if(buf ! = stackBuf) {free(buf);
    }
    return nread;
}
Copy the code

JNIEnv *env, jobject this

As can be seen, the flow of the underlying C code is as follows:

Malloc a len (Want the length of read) length buF array: buf = malloc(len)
System call read operation:IO_Read(fd, buf, len)
The buf datacopyThe JVMAn array of bytes: (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf)

FileOutputStream write function implementation is also using malloc, system call write. So why malloc a new chunk of memory to make system calls instead of using bytes arrays directly?

This is because Linux provides system calls such as PREad or read that operate only on a fixed area of memory. This means that operations can only be performed on direct memory, whereas objects in Heap memory can change their location after gc, meaning that memory in the JVM heap area is not a fixed area for operations outside of the virtual machine.

The diagram below:

Buf: Memory that is not managed by the JVM and is fixed
Bytes: The actual physical memory that corresponds to the heap in the JVM, which may be moved due to garbage collection
System calls such as read: after two context switches, the contents of the fd are copied first to the kernel-mode buffer and then from the kernel-mode buffer to the user-mode buF
Finally, the BUF data will be copied into the bytes array requested by the heap for FileInputStream reads, completing the Java program’s file reading function.

Java IO is blocking

We know that system calls such as read/write block. Blocking is not obvious in the file read/write example, but it is obvious in the network read/write example because there is not necessarily data in the buffer of the network connection. If there is data, read blocks and returns data.

For network IO it looks like this:

Java IO summary

The advantages of Java IO mentioned above are simple API design and usage, but there are two disadvantages:

The API provided is blocking, corresponding to the underlying logic of system calls such as Read /write
The process of allocating and copying memory more than the overhead of a system call

JAVA NIO

The two disadvantages of Java IO are that it is expensive and cannot cope with the increasingly high concurrency requirements. Java NIO was created to address two shortcomings of Java IO and meet the growing need for high concurrency.

This chapter will begin with a brief description of the design features of the NIO API, followed by step-by-step instructions on how to address the shortcomings of Java IO. And the design required to meet the high concurrency requirements will be further expanded in the next chapter.

Design features of the NIO API

NIO data structure is mainly Buffer and Channel, you can learn briefly first, have a general impression. (Skip this section if you already know.)

The Buffer interface, provides operations such as GET and PUT. The classes that implement this interface include ByteBuffer, LongBuffer, CharBuffer, etc. ByteBuffer is the most widely used.
- A Buffer is a Buffer that stores data and provides Pointers to position and limit for reading and writing data.

ChannelInterface is a Java program to provide read/write interface API, specific implementation is:
- The file FileChannel
- SocketChannel and DatagramChannel of the network
- Pipeline: PipelineChannel

In NIO, we combine two data structures, Buffer and Channel

Specific performance:

Writes the prepared ByteBuffer data to a channel
Read the available data in a channel into ByteBuffer

Proficient in using Buffer

A Buffer is a readable and writable data structure with many Pointers. The following figure shows a series of operations on it:

Allocate a ByteBufferByteBuffer.allocate(12)After: the position = 0, cap = limit = 12
- The buffer is empty, indicating that data can be written into it
- If you want to read it, you can read all zeros and change the value of position
Write databyteBuffer.put("hello".getBytes())After the position: = 5, cap = limit = 12
The flip operationbyteBuffer.flip()After: pos=0,cap=12, limit=5
- This is equivalent to changing the data between position and limit to readable mode[position,limit)It’s readable.
- When you’re done with the passage and ready to read, remember the flip
Read data from the channelbyteBuffer.get(bytes)Read buffer data into a byte array (length 2) with position pointing to position 2
Clean up the bufferbyteBuffer.clear();After that, Pointers like position return to their initial values, but no changes are actually made to any of the variables in the buffer
- When preparing to write data to the channel, always remember to clear first

In-depth understanding of Buffer

Understanding NIO’s buffers requires a clear understanding of how buffers are allocated. For ByteBuffer, there are three specific implementations:

HeapByteBuffer
DirectByteBuffer
MappedByteBuffer

In fact, the difference between these types of memory allocation is different, as shown in the following figure:

HeapByteBuffer is allocated to JVM heap memory
DirectByteBuffer is the out-of-heap memory used
MappedByteBuffer is a file descriptor for IO that maps the addresses of peripherals to memory via mmap system calls.

mmap

Void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset); void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset)

In the virtual address space of the current process, find a number of free consecutive virtual addresses that meet the requirements. Mmap can bypass the process of copying kernel and user data while reading and writing files

The underlying principles of NIO Read implementation

Here is a specific example, or used for file copy:

FileChannel readChannel = fis.getChannel();
FileChannel writeChannel = fos.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocate(1024 * 1024);/ / 1 m buffer
while(readChannel.read(byteBuffer) ! = -1) {
    byteBuffer.flip();
    writeChannel.write(byteBuffer);
    byteBuffer.clear();
}
Copy the code

FileInputStreamCan be achieved bygetChannel()getFileChannel
ByteBuffer.allocate(1024 * 1024)Distribution isHeapByteBuffer

So what does readChannel.read(byteBuffer) do underneath?

ByteBuffer bb = Util.getTemporaryDirectBuffer(dst.remaining());
int n = readIntoNativeBuffer(fd, bb, position, nd);
Copy the code

First, a temporary DirectBuffer is allocated via getTemporaryDirectBuffer, and finally the native method pread0 is called

static native int pread0(FileDescriptor fd, long address, int len,
                         long position) throws IOException;
Copy the code

C code goes in and says,

JNIEXPORT jint JNICALL
Java_sun_nio_ch_FileDispatcherImpl_pread0(JNIEnv *env, jclass clazz, jobject fdo, jlong address, jint len, jlong offset)
{
    jint fd = fdval(env, fdo);
    void *buf = (void *)jlong_to_ptr(address);

    return convertReturnVal(env, pread64(fd, buf, len, offset), JNI_TRUE);
}
Copy the code

For Linux, this is actually a system call like PREAD or pread64. Why allocate a DirectByteBuffer in the first place? In fact, this is similar to malloc, but the JVM manages the memory directly.

This looks very similar to what JAVA IO Read does. The buF is TMP DirectByteBuffer. But let’s say I write this code right here,

FileChannel readChannel = fis.getChannel();
FileChannel writeChannel = fos.getChannel();
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1024 * 1024);/ / 1 m buffer
while(readChannel.read(byteBuffer) ! = -1) {
    byteBuffer.flip();
    writeChannel.write(byteBuffer);
    byteBuffer.clear();
}
Copy the code

use ByteBuffer.allocateDirect(1024 * 1024)Allocating a DirectByteBuffer reduces the amount of user – to – user data copying.

MappedByteBuffer

Could it be more efficient? The MappedByteBuffer class is designed to improve efficiency even further.

@Test
public void test(a) throws IOException {
    FileChannel inChannel = FileChannel.open(Paths.get("nio.dmg"), StandardOpenOption.READ);
    FileChannel outChannel = FileChannel.open(Paths.get("nio2.dmg"), StandardOpenOption.WRITE,StandardOpenOption.READ,StandardOpenOption.CREATE);
    System.out.println("outChannel = " + outChannel);

    long size = inChannel.size();
    System.out.println("size = " + size);
    MappedByteBuffer inMappedBuffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, size);


    byte[] bytes = new byte[inMappedBuffer.limit()];
    inMappedBuffer.get(bytes);
    // mappedByteBuffer can only be created with readable channels...
    MappedByteBuffer outMappedBuffer = outChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);
    outMappedBuffer.put(bytes);
}
Copy the code

useMappedByteBuffer inMappedBuffer =inChannel.map(FileChannel.MapMode.READ_ONLY, 0, size)Get an MappedByteBuffer from inChannel
use MappedByteBuffer outMappedBuffer = outChannel.map(FileChannel.MapMode.READ_WRITE, 0, size);Also get another MappedByteBuffer
This can be done by reading inMappedBuffer and putting it into outMappedBuffer

Let’s look at the FileChannel map function:

MappedByteBuffer map(MapMode mode, long position, long size){
    addr = map0(imode, mapPosition, mapSize);  
    FileDescriptor mfd;
    try {
        mfd = nd.duplicateForMapping(fd);
    } catch (IOException ioe) {
        unmap0(addr, mapSize);
        throw ioe;
    }
    int isize = (int)size;
    Unmapper um = new Unmapper(addr, mapSize, isize, mfd);
    if((! writable) || (imode == MAP_RO)) {return Util.newMappedByteBufferR(isize,
                                         addr + pagePosition,
                                         mfd,
                                         um);
    } else {
        returnUtil.newMappedByteBuffer(isize, addr + pagePosition, mfd, um); }}Copy the code

The final call is private Native long map0(int Prot, long Position, long Length) function, check the implementation of Open JDK C language Linux system, found that it is actually called MMAP64.

JNIEXPORT jlong JNICALL
Java_sun_nio_ch_FileChannelImpl_map0(JNIEnv *env, jobject this,
                                     jint prot, jlong off, jlong len, jboolean map_sync){
    void *mapAddress = 0;
    jobject fdo = (*env)->GetObjectField(env, this, chan_fd);
    jint fd = fdval(env, fdo);
    int protections = 0;
    int flags = 0;
    mapAddress = mmap64(
        0./* Let OS decide location */
        len,                  /* Number of bytes to map */
        protections,          /* File permissions */
        flags,                /* Changes are shared */
        fd,                   /* File descriptor of mapped file */
        off);                 /* Offset into file */
    return ((jlong) (unsigned long) mapAddress);
}
Copy the code

Using channel. map allocation of MappedByteBuffer, not only does not have user-to-user data copying process, but also reduces user-to-kernel data copying process.

TransferTo API, such as

FileChannel also provides transferTo and transferFrom apis to further simplify file copying operations, as shown in the following example

@Test
public void test2(a) throws IOException {
    FileChannel inChannel = FileChannel.open(Paths.get("nio.dmg"), StandardOpenOption.READ);
    FileChannel outChannel = FileChannel.open(Paths.get("nio2.dmg"), StandardOpenOption.WRITE,StandardOpenOption.READ,StandardOpenOption.CREATE);
    inChannel.transferTo(0,inChannel.size(),outChannel);
}
Copy the code

The underlying layer is called directlysendfile(srcFD, dstFD, position, &numBytes, NULL, 0)To implement the
TransferTo can be used for file-to-file transfer
TransferTo can also be used to transfer files to sockets
Kafka uses NIO’s transferTo zero-copy for network and file copying

At the beginning of the NIO summary

This section covers some of the basic uses and main data structures of NIO, including:

Buffer operations
The basic flow and implementation principle of Read and write of FileChannel; And HeapByteBuf versus DirectByteBuf.
Usage and implementation principles of FileChannel MappedByteBuf
The usage and principle of the FileChannel transferTo

NIO has been able to reduce the number of data copies through various Buffer enhancements compared to JAVA IO.

The following details how NIO solves the problem of blocking apis in network communication.

Java IO and NIO