Understand JAVA NIO from the Linux kernel

preface

IO can be divided into disk I/O and network I/O. Disk I/O is faster than network I/O. This article mainly introduces disk I/O and network I/O will be written next week.

JAVA abstracts NIO into channels, which can be divided into FileChannel (disk IO) and SocketChannel (network IO).

If your understanding of IO is limited to the API level, it is not enough to understand how IO is handled at the system level to avoid unnecessary pitfalls.

Content of this article:

FileChannel Reads and writes copied files.
The introduction of ByteBuffer
JVM file process lock, FileLock
HeapByteBuffer, DirectByteBuffer or Mmap is faster
fromThe Linux kernelIn theVirtual memory 、The system calls,File descriptor,Inode,Page Cache 、Missing page exceptionDescribe the entire IO process
How to reclaim DirectByteBuffer memory outside the JVM heap

All the diagrams related to computer systems in this article are from “Understanding Computer Systems in Depth”.

The understanding of Linux comes from books and reference materials. The content of this article is mainly my own understanding and code verification. Some descriptions may not be accurate, but I should focus on the understanding process.

NIO

NIO was introduced with Java 1.4 and is called Non-blocking IO, also known as New IO.

NIO abstracts Channel as buffer-oriented (operating on a block of data), non-blocking IO.

A Channel is only responsible for transferring data, and a Buffer is responsible for storing data.

Buffer

The capacity, limit, and position attributes in the Buffer are important, and can cause a lot of holes in reading and writing files.

Capacity Indicates the maximum Buffer capacity, which is equal to the length of an array.

Limit is a pointer that identifies the maximum index of the data that can be manipulated in the current array.

Position represents the index when the data is next read

@Test
public void run1(a) {
    // `DirectByteBuffer`
    final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1024);
    // `HeapByteBuffer`
    final ByteBuffer allocate = ByteBuffer.allocate(1024);
}
Copy the code

Heapbytebuffers are allocated within the Jvm heap and are fast to create but slow to read and write due to the size of the Jvm heap. The bottom layer is actually a byte array.

Directbytebuffers are allocated outside the Jvm heap, are not limited by the Jvm heap size, and are slow to create and fast to read and write. DirectByteBuffer is stored in Linux and belongs to the heap of processes. DirectByteBuffer is affected by the JVM parameter MaxDirectMemorySize.

Set the JVM heap, 100 m, to run the program error Exception in the thread “is the main” Java. Lang. OutOfMemoryError: Java heap space. Because the JVM heap is specified to be 100m, some class files are also placed in the heap, and the actual heap is less than 100m, so an error is reported when applying for 100m heap memory.

public class BufferNio {
    // -Xmx100m
    public static void main(String[] args) throws InterruptedException {
        // HeapByteBuffer is in the JVM heap, resulting in oom because the heap is not allocated at 100m(some Classes in Java also occupy the heap)
        System.out.println("Request 100 m 'HeapByteBuffer'");
        Thread.sleep(5000);
        ByteBuffer.allocate(100 * 1024 * 1024); }}Copy the code

Set JVM heap to 100m, MaxDirectMemorySize to 1g, create DirectByteBuffer in an endless loop, print 10 times the application for DirectBuffer success, An Exception in the thread “is the main” Java. Lang. OutOfMemoryError: Direct buffer memory, what you say this pile of outside DirectByteBuffer recycle.

public class BufferNio {
// -Xmx100m -XX:MaxDirectMemorySize=1g
    public static void main(String[] args) throws InterruptedException {
        System.out.println("Apply for 100 M DirectByteBuffer");
        final ArrayList<Object> objects = new ArrayList<>();
        while (true) {
            // DirectByteBuffer is not in the JVM heap, so it can be applied successfully, but it is not unlimited and has limitations (MaxDirectMemorySize).
            final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(100 * 1024 * 1024);
            objects.add(byteBuffer);
            System.out.println("Directbuffer application succeeded"); System.out.println(ManagementFactory.getMemoryMXBean().getHeapMemoryUsage()); System.out.println(ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage()); }}}Copy the code

FileChannel

Read the file

@Test
public void read(a) throws IOException {
    final Path path = Paths.get(FILE_NAME);
    // Create a FileChannel specifying the read and write permissions for this channel
    final FileChannel open = FileChannel.open(path, StandardOpenOption.READ);
    // Create a buffer that is the same size as this file
    final ByteBuffer allocate = ByteBuffer.allocate((int) open.size());
    open.read(allocate);
    open.close();
    // Switch to read mode, position=0
    allocate.flip();
    // Decode with UTF-8
    final CharBuffer decode = StandardCharsets.UTF_8.decode(allocate);
    System.out.println(decode.toString());
}
Copy the code

Write files

@Test
public void write(a) throws IOException {
    final Path path = Paths.get("demo" + FILE_NAME);
    // The channel has write permission. Create indicates that the file is created when it does not exist
    final FileChannel open = FileChannel.open(path, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
    final ByteBuffer allocate = ByteBuffer.allocate(1024);
    allocate.put(Zhang Panqin AAAAA-1111111.getBytes(StandardCharsets.UTF_8));
    // Switch write mode, position=0
    allocate.flip();
    open.write(allocate);
    open.close();
}
Copy the code

Copy the file

@Test
public void copy(a) throws IOException {
    final Path srcPath = Paths.get(FILE_NAME);
    final Path destPath = Paths.get("demo" + FILE_NAME);
    final FileChannel srcChannel = FileChannel.open(srcPath, StandardOpenOption.READ);
    final FileChannel destChannel = FileChannel.open(destPath, StandardOpenOption.WRITE, StandardOpenOption.CREATE);
    // In the transferTo implementation class, a 8M MappedByteBuffer is used to copy data, but this method can only copy files with a maximum number of bytes integer.max
    srcChannel.transferTo(0, srcChannel.size(), destChannel);
    destChannel.close();
    srcChannel.close();
}
Copy the code

FileLock

FileLcok is a JVM process file lock that takes effect between multiple JVM processes. The process has read and write permissions on the file, including shared and exclusive locks.

The same process cannot lock duplicate regions of the same file.

In the same process, the first thread locks (0,2) of the file, while another thread locks (1,2), the file lock area is repeated, the program will report an error.

One process locks (0,2) and another locks (1,2). This is fine because FileLock is a JVM process lock.

Run the following program twice and print the result as

The first program printed without a hitch

Locks 0-3 are acquired, the code is not blocked. Locks 4-7 are acquired, the code is not blockedCopy the code

The second program prints

Access to the lock4-7The code is not blocked to acquire the lock0-3The code is not blockedCopy the code

When the first program runs, file_lock. TXT is locked at positions 0-2. The first program holds the lock for 10 seconds. When the second program runs, it blocks and waits for the FileLock to release the lock.

public class FileLock {
    public static void main(String[] args) throws IOException, InterruptedException {
        final Path path = Paths.get("file_lock.txt");
        final FileChannel open = FileChannel.open(path, StandardOpenOption.WRITE, StandardOpenOption.READ);
        final CountDownLatch countDownLatch = new CountDownLatch(2);
        new Thread(() -> {
         
            try (final java.nio.channels.FileLock lock = open.lock(0.3.false)) {
             
                System.out.println("Lock 0-3 obtained, code is not blocked.");
                Thread.sleep(10000);
                final ByteBuffer wrap = ByteBuffer.wrap("aaa".getBytes());
                open.position(0);
                open.write(wrap);
                Thread.sleep(10000);
            } catch (IOException | InterruptedException e) {
                e.printStackTrace();
            } finally {
                countDownLatch.countDown();
            }
        }).start();
        Thread.sleep(1000);
        new Thread(() -> {
            try (final java.nio.channels.FileLock lock = open.lock(4.3.false)) {
                System.out.println("Lock 4-7 obtained, code is not blocked");
                final ByteBuffer wrap = ByteBuffer.wrap("bbb".getBytes());
                open.position(4);
                open.write(wrap);
            } catch (IOException e) {
                e.printStackTrace();
            } finally{ countDownLatch.countDown(); } }).start(); countDownLatch.await(); open.close(); }}Copy the code

When will the above program. The second thread instead of Java nio. Channels. FileLock lock = open. The lock (1, 3, false), because the same process are not allowed to repeat area of lock file, the program will report errors.

Exception in thread "Thread-1" java.nio.channels.OverlappingFileLockException

HeapByteBuffer or DirectByteBuffer?

The FileChannel implementation, FileChannelImpl, checks whether a ByteBuffer is a DirectBuffer when read or written, and creates a DirectBuffer if it is not. Copy the original Buffer data to DirectBuffer for use. So in terms of read and write efficiency, DirectByteBuffer reads and writes faster. But DirectByteBuffer creation is relatively time-consuming.

Although DirectByteBuffer is out-of-heap, FullGC is also triggered when the out-of-heap memory footprint reaches -xx :MaxDirectMemorySize, and OOM is thrown if there is no way to recycle out-of-heap memory.

// The following program will continue, but will trigger FullGC to reclaim direct memory outside the heap
public class BufferNio {
    // -Xmx100m -XX:MaxDirectMemorySize=1g
    public static void main(String[] args) throws InterruptedException {
        System.out.println("Request 100 m 'HeapByteBuffer'");
        while (true) {
            // If the current object is not referenced, GC root cannot reach DirectByteBuffer
            ByteBuffer.allocateDirect(100 * 1024 * 1024);
            System.out.println("Directbuffer application succeeded"); }}}Copy the code

If a DirectByteBuffer created in an infinite loop does not have GC ROOT, the object will be recycled, and when it is recycled, it will only be recycled from the heap.

From DirectByteBuffer source code, you can see that it has a member variable private final Cleaner; , when the trigger FullGC, because the cleaner didn’t gc root can reach, lead to the cleaner can be recycled, recycling will trigger the cleaner. The clean (in the Reference. TryHandlePending trigger) method is called, Thunk is DirectByteBuffer Deallocator example, the run method, called the Unsafe. FreeMemory to release the heap memory.

public class Cleaner extends PhantomReference<Object> { private final Runnable thunk; public void clean() { if (remove(this)) { try { this.thunk.run(); } catch (final Throwable var2) { AccessController.doPrivileged(new PrivilegedAction<Void>() { public Void run() { if (System.err ! = null) { (new Error("Cleaner terminated abnormally", var2)).printStackTrace(); } System.exit(1); return null; }}); }}}}Copy the code

The memory mapping

When an application reads a file, the data needs to be read from disk to kernel space first (no page cache for the first read) and then copied from kernel space to user space so that the application can use the read data. When all data in a file is stored in the kernel’s Page Cache, the file is copied from the kernel to user space instead of being read from disk.

When an application writes data to a file, it copies the data to the kernel’s Page cache, and then calls fsync to dump the data from the kernel to the file (as long as the call returns a success, the data is not lost). Or do not call fsync drop disk, application data as long as written to the kernel pagecache, write operation is complete, data drop disk by the kernel Io scheduler at the appropriate time to drop disk (sudden power loss will lose data, MySQL such programs are their own maintenance data drop disk).

We can see that data is always read and written by a copy from user space and kernel space. If we can remove this copy, it will be much more efficient. This is mMAP (memory map). Point user space and kernel space memory to the same physical memory. Memory Mapping is abbreviated as MMAP. Corresponding to the system call Mmap

In this way, reading and writing data in user space is also the actual operation of kernel space, reducing data copy.

In Linux, the address of the process is a virtual address, and the CPU maps the virtual address to the physical address of the physical memory. Mmap maps a virtual address of a user process and a virtual address of the kernel space to the same physical memory to reduce data copy.

The user program does not need to call system calls read and write to read or write data after invoking system calls mmap.

Mapping of virtual memory to physical memory

A computer’s main memory can be thought of as an array of M consecutive bytes, each with a unique Physical Address.

The Cpu uses Virtual addressing (VA) to find physical addresses.

The CPU translates the virtual address used by the process to the physical address in the physical main Memory through the Memory Management Unit (MMU) on the CPU to obtain data.

After a process is loaded, the system allocates a virtual address space for the process. When a virtual address in the virtual address space is used, it is mapped to a physical address in main memory.

When multiple processes need to share data, they only need to map some virtual addresses in their virtual address space to the same physical address.

Usually when we manipulate data, we don’t manipulate data byte by byte, which is inefficient, and we usually access certain bytes consecutively. Therefore, in memory management, the memory space is divided into pages to manage, Physical pages in the Physical memory and Virtual pages in the Virtual memory to manage. A typical page size is 4KB.

The system manages the correspondence between virtual pages and physical pages through MMU and Page Table, which is an array of Page Table entries (PTES)

If the value of PTE is 1, data is in memory; if the value of PTE is 0, data is on disk.

When the data corresponding to the accessed virtual address is no longer in physical memory, there are two cases:

1. When the memory is sufficient, the data corresponding to the virtual page on disk will be directly loaded into the physical memory.

2. When the memory is insufficient, swap will be triggered. According to THE LRU, the virtual pages that are recently used less frequently will be written to the disk to eliminate some of the data in the physical memory, and then set the corresponding virtual pages to 0, and then load the data on the disk to the memory.

The virtual memory of the process

Linux assigns a separate virtual memory address to each process,

When our program runs, instead of loading the entire program code file into memory at once, lazy loading is performed.

The disk controller manages the disk by block. The system deals with the disk controller through the Page Cache. A block contains multiple sectors, and a page contains multiple blocks. A file on disk corresponds to an Inode, and an Innode records the metadata of the file and the location of the data. When the system starts up, the Inode data is loaded into main memory. The inodes in the system also record their location in physical memory (Page Cache). If the inodes are not loaded into memory, the inodes do not record their location in physical memory. Before the program executes, it initializes its virtual memory, which records which InNodes the code corresponds to.Copy the code

When executing the program, the system will initialize the virtual memory of the current program, and then run the main function. When it is found that some codes are not loaded into the memory when executing the code, the page missing exception will be triggered. The corresponding Innoe will be found according to the virtual page, and then the data needed in the disk will be loaded into the memory. The virtual page is then marked as loaded into memory and accessed directly from memory the next time.

# # mmap in Java

Open. map also returns a DirectByteBuffer, but this method returns a DirectByteBuffer that uses a different constructor and binds a FD. The system calls read and write are not triggered when we read or write data, which is the benefit of memory mapping.

public class MMapDemo {
    public static void main(String[] args) throws URISyntaxException, IOException, InterruptedException {
        final URL resource = MMapDemo.class.getClassLoader().getResource("demo.txt");
        final Path path = Paths.get(resource.toURI());
        final FileChannel open = FileChannel.open(path, StandardOpenOption.READ);
        // Initiate the system call mmap
        final MappedByteBuffer map = open.map(FileChannel.MapMode.READ_ONLY, 0, open.size());
        // When the data is read, the data can be fetched directly from its own virtual memory
        final CharBuffer decode = StandardCharsets.UTF_8.decode(map);
        System.out.println(decode.toString());
        open.close();
        Thread.sleep(100000); }}Copy the code

Although the following buffer is also DirectByteBuffer, unlike Mmap, it does not bind fd. Reading and writing data still needs to be copied from user space to kernel space, and system calls will also occur, which is less efficient than MMAP.

public class MMapDemo {
    public static void main(String[] args) throws URISyntaxException, IOException, InterruptedException {
        final URL resource = MMapDemo.class.getClassLoader().getResource("demo.txt");
        final Path path = Paths.get(resource.toURI());
        final FileChannel open = FileChannel.open(path, StandardOpenOption.READ);
        // This DirectByteBuffer uses a different construct, which goes through the system call read
        final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1024);
        final int read = open.read(byteBuffer);
        byteBuffer.flip();
        System.out.println(StandardCharsets.UTF_8.decode(byteBuffer).toString());
        Thread.sleep(100000); }}Copy the code

System calls to trace code, using Strace under Linux

#! /bin/bash
rm -fr /nio/out.*
cd /nio/target/classes
strace -ff -o /nio/out java com.fly.blog.nio.MMapDemo
Copy the code

Data read and write speed mmap is greater than the ByteBuffer. AllocateDirect is greater than the ByteBuffer. The allocate.

This article was created by Zhang Panqin on his blog www.mflyyou.cn/. It can be reproduced and quoted freely, but the author must be signed and indicate the source of the article.

If reprinted to wechat official account, please add the author’s official qr code at the end of the article. Wechat official account name: Mflyyou