Java file IO stream processing

Java MappedByteBuffer & FileChannel & RandomAccessFile & FileXXXputStream read and write.

Java file IO reading introduction

Java introduced niO-related classes such as ByteBuffer in JDK 1.4, which allowed Java programmers to write and read files using block-based methods instead of stream-based ones. Java IO operations typically use bufferedReaders. BufferedInputStream and other buffered IO classes handle large files. However, Java NIO has introduced a method based on MappedByteBuffer to handle large files. This article describes the internal implementation of such high performance. Analyze whether the FileChannel is fast or the MappedByteBuffer.

In addition, the JDK introduces the Kings of IO performance optimization — zero-copy sendFile and Mmap. But how good are they? How fast is it compared to RandomAccessFile? Under what circumstances is fast?

Java file IO stream technology pain points

If we want to do large files read and write (above 2G). With traditional stream reads and writes, there is a high probability that memory will simply explode, making this almost impossible.

MappedByteBuffer

One of the capabilities of MappedByteBuffer is that it allows us to read and write files that are too large to fit in memory. With this, we can assume that the entire file is in memory (in fact, large files are in memory and virtual memory) and basically access it as a very large array, which greatly simplifies operations such as modifying large files.

The technical principle of MappedByteBuffer

The underlying technique used by MappedByteBuffer is memory mapping. So before we talk about MappedByteBuffer, let’s talk about computer memory management. Let’s look at some terms of computer memory management:

  • MMU: memory management unit of the CPU.

  • Physical memory: The memory space of a memory module.

  • Virtual memory: memory management computer system of a kind of technology, it can make the program thinks it has a continuous available memory (a continuous complete address space), and, in fact, it is usually divided into multiple physical memory fragments, some temporary storage on the external disk storage, when the need for data exchange.

  • Page image files: Virtual memory usually uses page image files, which are special files on the hard disk. The operating system is responsible for reading and writing page file contents, a process called “page interrupt/switch”.

  • Pagefile: a file created by the operating system to reflect the size of the disk space on which virtual memory is built and used. In Windows, the pagefile.sys file, whose existence means that data that is temporarily unused is moved to the disk when physical memory is used up.

  • Page miss interrupt: An interrupt issued by the MMC when a program attempts to access a page that has been mapped in the virtual address space but not loaded into physical memory. If the operating system determines that the access is valid, it tries to load the associated page from the virtual memory file into physical memory.

Virtual memory and physical memory

If a process is running, it is the memory required may be larger than the sum of memory capacity, such as memory chips is 256 m, program is to create a 2 g data area, so all data cannot be loaded into memory (physical memory), there must be data to other medium (such as a hard disk), to process needs to access that part of the data, Then schedule into physical memory.

What are virtual memory addresses and physical memory addresses?

Assuming your computer is 32-bit, its address bus is 32-bit, meaning that it can address 00xFFFFFFFF (4G) in the address space, but if your computer has only 256 MB of physical memory 0x0x0FFFFFFF (256 MB) and your process generates an address that is not in the 256 MB address space, So what does the computer do? Before answering this question, let’s explain the memory paging mechanism of your computer.

Paging and page frames

Paging the virtual memory address space (4G 32-bit) generates pages. Paging the physical memory address space (say 256mb) generates page frames. Pages and page frames are the same size, so the number of virtual memory pages must be larger than the number of physical memory page frames.

A page table

On the computer, there is a page table that maps virtual memory pages to physical memory pages, or rather, page numbers to page frames, one-to-one.

Invalidation of memory pages

Number of virtual memory pages > number of physical memory page frames. Some virtual memory page addresses never have a corresponding physical memory address space. No, that’s how the operating system handles it. The operating system has a Page fault feature.

The operating system finds a least-used page frame (LFU), invalidated it, writes it to disk, then places the pages to be accessed in the page frame, and modifies the mapping in the page table to ensure that all pages are scheduled.

Virtual memory address and physical memory address

Virtual memory address: consists of the page number (associated with the page number in the page table) and the offset (the size of the page, that is, how much data the page can hold).

Conversion of virtual memory to physical memory

For example, if you have a virtual address whose page number is 4 and offset is 20, its addressing process looks like this: First, find the page frame number corresponding to page 4 in the page table (for example, 8). If the page is not in memory, use the invalidity mechanism to call in the page. Then, pass the page frame number and offset to MMU to form a physical address that really exists.

Conclusion shows that

Memory file mapping is an expensive operation for most operating systems. So MappedByteBuffer is good for reading and writing large files. For small files directly use ordinary read and write is good.

Use the MappedByteBuffer case

MappedByteBuffer inherits from ByteBuffer, has position and limit Pointers, and wraps a view of another kind of Buffer. You can treat the entire file (no matter how big) as a ByteBuffer.

  • java.lang.Object
  • java.nio.Buffer
  • java.nio.ByteBuffer
  • java.nio.MappedByteBuffer
Simple read and write examples
 public class MappedByteBufferTest {
    public static void main(String[] args) {
        File file = new File("D://data.txt");
        long len = file.length();
        byte[] ds = new byte[(int) len];
        try {
            MappedByteBuffer mappedByteBuffer = new RandomAccessFile(file, "r")
                    .getChannel()
                    .map(FileChannel.MapMode.READ_ONLY, 0, len);
            for (int offset = 0; offset < len; offset++) {
                byte b = mappedByteBuffer.get();
                ds[offset] = b;
            }
            Scanner scan = new Scanner(new ByteArrayInputStream(ds)).useDelimiter("");
            while (scan.hasNext()) {
                System.out.print(scan.next() + ""); }}catch (IOException e) {}
    }
}
Copy the code
Problems with MappedByteBuffer

The whole process is very fast with MappedByteBuffer. The MappedByteBuffer is created using the filechannel.map method, and the mapping between the MappedByteBuffer and the file it represents remains in effect until the buffer itself becomes a garbage collection buffer.

The official explanation

The buffer and the mapping that it represents will remain valid until the buffer itself is garbage-collected.A mapping, once established, is not dependent upon the file channel that was used to create it. Closing the channel, in particular, has no effect upon the validity of the mapping.

This can lead to some problems, mainly memory usage and file closure uncertainties. Files opened by MappedByteBuffer are closed only for garbage collection, and this point is undefined.

For example, start with a MappedByteBuffer map to a source file. Perform the replication operation. Want to delete the source file after the end. Deletion will fail, mainly because the variable MappedByteBuffer still holds the handle to the source file and the file is in an undeletable state.

There is no official word on how to release handles, but you can try it:

Actual requirements case scenarios

Copy a file, delete the source file after the copy is done using MappedByteBuffer but MappedByteBuffer and its associated resources remain valid until garbage collection but MappedByteBuffer holds references to the source file, Therefore, deleting the source file failed.

	public static void copyFileAndRemoveResource(a)  {
		File source = null;
		File dest = null;
		MappedByteBuffer buf = null;
		try {
			source = new File("D:\\eee.txt");
			dest = new File("C:\\eee.txt");
		} catch (NullPointerException e) {
			e.printStackTrace();
		}
		try (FileChannel in = new FileInputStream(source).getChannel();
				FileChannel out = new FileOutputStream(dest).getChannel();) {
			long size = in.size();
			buf = in.map(FileChannel.MapMode.READ_ONLY, 0, size);
			out.write(buf);
			buf.force();// Force the content changes made by this buffer to be written to the storage device containing the mapping file.
			System.out.println("File copy completed!");
			// System.gc();
			// Both close the file channel and release the MappedByteBuffer
			in.close();// If the exception is thrown before closing, it is not afraid because try-with-resource is used
			// Force release of the MappedByteBuffer resource
			clean(buf);
			// After the file is copied, delete the source file
			/* * source.delete() deletes the file or directory represented by this abstract pathname. If the path represents a directory, the directory must be an empty folder. Using the delete method in java.nio.file.Files can tell you why the deletion failed * so try files.delete (paths.get (pathName)); Replace delete * system.out.println (source.delete() == true? : "Delete failed!" ); * /
			Files.delete(Paths.get("D:\\eee.txt"));
			System.out.println("Delete successful!");
		} catch (Exception e) {
			e.printStackTrace();
		} 
	public static void clean(final MappedByteBuffer buffer) throws Exception {
		if (buffer == null) {
			return;
		}
		buffer.force();
		AccessController.doPrivileged(new PrivilegedAction<Object>() {/ / ring privileges
			@Override
			public Object run(a) {
				try {
					// System.out.println(buffer.getClass().getName());
					Method getCleanerMethod = buffer.getClass().getMethod("cleaner".new Class[0]);
					getCleanerMethod.setAccessible(true);
					sun.misc.Cleaner cleaner = (sun.misc.Cleaner) getCleanerMethod.invoke(buffer, new Object[0]);
					cleaner.clean();
				} catch (Exception e) {
					e.printStackTrace();
				}
				return null; }});/*
		 * 
		 * 在MyEclipse中编写Java代码时,用到了Cleaner,import sun.misc.Cleaner;可是Eclipse提示:
		 * Access restriction: The type Cleaner is not accessible due to
		 * restriction on required library *\rt.jar Access restriction : The
		 * constructor Cleaner() is not accessible due to restriction on
		 * required library *\rt.jar
		 * 
		 * 解决方案1(推荐): 只需要在project build path中先移除JRE System Library,再添加库JRE
		 * System Library,重新编译后就一切正常了。 解决方案2: Windows -> Preferences -> Java ->
		 * Compiler -> Errors/Warnings -> Deprecated and trstricted API ->
		 * Forbidden reference (access rules): -> change to warning
		 */}}Copy the code

At this point, the solution to this problem is clear — delete the index file and unmap the memory, and remove the mapped objects.

Unfortunately, Java doesn’t have a particularly good solution — somewhat surprisingly, there’s no unmap method for MappedByteBuffer, This method won’t even be introduced until Java 10. DirectByteBufferR extends DirectByteBuffer implements DirectBuffer Use the default access modifier

The Java is provided within the “temporary” solution — DirectByteBufferR. The cleaner (). The clean () remember this is just a temporary method.

  • After all, the class was officially hidden in Java9, and not all JVM vendors have it.
  • Another solution is to explicitly call system.gc () to let the GC reclaim the cache before it expires.
  • But frankly, this approach has more drawbacks: first, explicit GC calls are strongly discouraged, and second, many production environments even disable explicit GC calls, so this approach is not ultimately considered a solution to this bug.
The map process

FileChannel provides a map method to map a file to virtual memory, usually the entire file, or segment if the file is large.

Several variables in a FileChannel
  • MapMode mode: specifies the mode for accessing memory image files.
  • Mapmode. READ_ONLY: read-only. Attempts to modify the resulting buffer will result in an exception being thrown.
  • Mapmode. READ_WRITE: read/write, changes to the resulting buffer will eventually be written to the file; But that change is not necessarily visible to other programs that map to the same file.
  • Mapmode. PRIVATE: it can be read and written for PRIVATE use, but the modified content is not written to the file, but only changes to the buffer itself. This capability is called “copy on write”.
  • Position: start position of file mapping.
  • AllocationGranularity: Memory allocation size for mapping Buffers is initialized using the native function initIDs.

MQS using IO zero copy

There are a lot of MQ in the Java world: ActiveMQ, Kafka, RocketMQ, Qunar MQ, and they are the biggest users of NIO zero copy in the Java world.

However, their performance is pretty much the same. Regardless of other factors, such as network transfer, data structure design, and file storage, let’s just discuss how the Broker side reads and writes files to see how they differ.

Summarized the file read and write methods used by each MQ.

  • Kafka: Record reads and writes are based on FileChannel. Index reads and writes are based on MMAP.

  • RocketMQ: The read drive is based on MMAP, and the write drive uses MMAP by default. You can modify the configuration to configure FileChannel.

  • QMQ: where MQ, read disk using MMAP, write disk using FileChannel.

  • ActiveMQ 5.15: Reading and writing is all based on RandomAccessFile, which is why we abandoned ActiveMQ.

MMAP It is well known that os-based MMAP memory mapping technology, through MMU mapping files, makes random read and write files and read and write memory similar speed.

The resources

www.linuxjournal.com/article/634…

Thinkinjava. Cn / 2019/05/12 /…