First of all, the most important aspect of the FileChannel/MMAP contest is choosing how to read and write files. How many different types of file IO are there in JAVA? Native read and write methods can be broken down into three categories: plain IO, FileChannel, and MMAP. For example, FileWriter and FileReader exist in the java. IO package. They are normal IO. A FileChannel exists in the java.nio package and is a type of NIO, but note that NIO does not necessarily mean non-blocking; in this case, FileChannel is blocking; More special is the latter MMAP, which is a special way of reading and writing files derived from FileChannel calling the map method, known as memory mapping.

Using FIleChannel:

1 FileChannel fileChannel = new RandomAccessFile(new File(“db.data”), “rw”).getChannel(); How to obtain MMAP:

1 MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, filechannel.size(); MappedByteBuffer is the MMAP operation class in JAVA.

While the traditional IO approach to byte transfer is deprecated, we’ll focus on the difference between FileChannel and MMAP.

FileChannel Read/write 12 3 4 5 6 7 8 9 10 11 12 13 14 15 byte[] data = new byte[4096]; long position = 1024L; // Specify position to write 4KB of data filechannel. write(bytebuffer.wrap (data), position); // Write 4KB of data from the current file pointer filechannel.write (bytebuffer.wrap (data));

// Read ByteBuffer = ByteBuffer. Allocate (4096); long position = 1024L; // Specify position to read 4KB of data filechannel. read(buffer,position); // Read 4KB of data from the current file pointer position filechannel.read (buffer); FileChannel works mostly with the ByteBuffer class, which you can think of as a wrapper class for byte[] that provides a rich API for manipulating bytes. Note that both write and read methods are thread-safe, and that the FileChannel internally passes a private final Object positionLock = new Object(); Locks to control concurrency.

Why is FileChannel faster than regular IO? This may not be an exact statement, since you need to use it correctly, but FileChannel is only as good as writing a multiple of 4kb at a time. Thanks to FileChannel’s use of memory buffers such as ByteBuffer, FileChannel allows very precise control over the size of the write disk. This is not possible with normal IO. Is 4KB fast? It depends on the disk structure of your machine, and is affected by the operating system, file system, and CPU. For example, the disk in the middleware performance Challenge must write at least 64KB at a time to achieve the highest IOPS.

Disk for the middleware performance challenge rematch Disk for the middleware performance challenge rematch

However, PolarDB is completely different, it can be described as exceptional fierce, specific performance due to the game is still in progress, we will not go into the details, but with benchmark Everyting skills, we can fully find out.

The other thing that makes FileChannel so efficient is, before I get to that, I want to ask a question: Does FileChannel write data directly from ByteBuffer to disk? Think for a few seconds… The answer is: NO. The data in ByteBuffer is separated from the data on disk by a layer called PageCache, which is the cache between user memory and disk. We all know that disk IO and memory IO are orders of magnitude different in speed. We can think of filechannel.write writing PageCache as doing the disk-drop, but actually, the operating system does the final writing of PageCache to disk for us, so understanding this concept, You should be able to understand why FileChannel provides a force() method that notifies the operating system to perform a timely flush.

Similarly, when we use FileChannel to read, we experience the same thing: Disk ->PageCache-> user memory these three stages, for daily users, you can ignore the PageCache, but as a challenger competition, PageCache in the tuning process is never can ignore, on the read operation here do not do too much introduction, we will be mentioned again in the following summary, So let’s think of this as introducing the idea of PageCache.

12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // write byte[] data = new byte[4]; int position = 8; // Write 4B of data from the current mmap pointer position mappedByteBuffer.put(data); // Specify position to write 4B data MappedByteBuffer subBuffer = mappedbytebuffer.slice (); subBuffer.position(position); subBuffer.put(data);

// read byte[] data = new byte[4]; int position = 8; Mappedbytebuffer.get (data); // Read 4B data from the current mmap pointer position. MappedByteBuffer subBuffer = mappedbytebuffer.slice (); // Specify position to read 4B data. subBuffer.position(position); subBuffer.get(data); FileChannel is powerful enough, but what else can MappedByteBuffer do? Let me keep you in the dark. First, let me introduce the use of MappedByteBuffer.

When we execute filechannel. map(filechannel.mapmode. READ_WRITE, 0, 1.5 * 1024 * 1024 * 1024); After that, if you look at the changes on disk, you’ll immediately get a 1.5 GIGAByte file, but it’s all zeros (bytes 0). This fits MMAP’s Chinese description: a memory-mapped file. Anything we do to the MappedByteBuffer in memory will be mapped to the file.

Mmap maps the file to the user in the space of virtual memory, leave out from the kernel buffer is copied to the user space, the process of the positions of the files in virtual memory with the corresponding address, can operate the file like operating memory, is equivalent to have the entire file into memory, but not before really to use these data consumption physical memory, There is no disk read or write operation, and only when the data is actually used, that is, when the image is ready to be rendered on the screen, the VIRTUAL memory management system VMS loads the corresponding data block from the disk to the physical memory for rendering according to the missing page loading mechanism. This way of reading and writing files reduces the copying of data from the kernel cache to user space, which is very efficient

After reading the slightly more official description, you might be wondering what the point of a FileChannel is, given the existence of such clever hacking technology. And there are articles on the web saying that MMAP is an order of magnitude better at handling large files than FileChannel! However, from what I’ve learned, MMAP is not a silver bullet for file IO, and it only performs slightly better than FileChannel in scenarios where ** a very small amount of data is written at once. I’m going to tell you a few things that will frustrate you. Using MappedByteBuffer, at least in JAVA, is a huge hassle and a pain in the ass, with three main points:

MMAP must be used to specify the size of the memory map, and the size of a map is limited to about 1.5 GB. Repeated map will bring the problem of virtual memory reclamation and reallocation, which is too unfriendly to the situation of uncertain file size. MMAP uses virtual memory, which is controlled by the operating system like PageCache. Although it can be manually controlled by force(), this time is not well grasped, which can be very troublesome in small memory scenarios. When MappedByteBuffer is no longer needed, you can manually free the occupied virtual memory, but… In a very strange way.

ByteBuffer buffer = mappedByteBuffer; if (buffer == null || ! buffer.isDirect() || buffer.capacity()== 0) return; invoke(invoke(viewed(buffer), "cleaner"), "clean"); } private static Object invoke(final Object target, final String methodName, final Class<? >... args) { return AccessController.doPrivileged(new PrivilegedAction<Object>() { public Object run() { try { Method method = method(target, methodName, args); method.setAccessible(true); return method.invoke(target); } catch (Exception e) { throw new IllegalStateException(e); }}}); } private static Method method(Object target, String methodName, Class<? >[] args) throws NoSuchMethodException { try { return target.getClass().getMethod(methodName, args); } catch (NoSuchMethodException e) { return target.getClass().getDeclaredMethod(methodName, args); } } private static ByteBuffer viewed(ByteBuffer buffer) { String methodName = "viewedBuffer"; Method[] methods = buffer.getClass().getMethods(); for (int i = 0; i < methods.length; i++) { if (methods[i].getName().equals("attachment")) { methodName = "attachment"; break; } } viewedBuffer = (ByteBuffer) invoke(buffer, methodName); if (viewedBuffer == null) return buffer; else return viewed(viewedBuffer); }Copy the code

That’s right, you read that right, the whole point of the code is to recycle the MappedByteBuffer.

So I recommend using FileChannel first for the initial code submission, and then switching to MMAP implementations for scenarios where you have to scrub with a small amount of data (such as a few bytes), Other scenarios where FileChannel is perfectly fine to cover(provided you understand how to use FileChannel properly). I don’t have a theory as to why MMAP is better than FileChannel at writing small amounts of data at once, but if you have any clues, please leave a comment. Theoretically, FileChannel also writes to memory, but MMAP performs better when writing small amounts of data. Therefore, MMAP should be used in most cases when index data is dropped. As for whether the virtual memory allocated by MMAP is real PageCache, I think it can be approximately understood as PageCache.

Order to read faster than random read, write faster than random write sequence Whether you are a mechanical drive or SSD, the conclusion must be set up, although the reasons behind are not quite same, today we don’t discuss the old mechanical hard disk storage medium, mainly foucs on the SSD, to see why the random read and write in it slower than order, speaking, reading and writing. Even though the composition of SSDS and file systems varies, today’s analysis is instructive.

First of all, what is sequential reading, what is random reading, what is sequential writing, what is random writing? Maybe we just contact file IO operation will not have such doubts, but write write, I began to doubt their understanding, DO not know if you have experienced such a similar stage, anyway, I did doubt for a period of time. So, let’s look at two pieces of code:

Write method 1:64 threads, the user uses an atomic variable to record the position of the write pointer, write concurrently

AtomicLong wrotePosition = new AtomicLong(0); for(int i=0; i<1024; i++){ final int index = i; executor.execute(()->{ fileChannel.write(ByteBuffer.wrap(new byte[4*1024]),wrote.getAndAdd(4*1024)); })}Copy the code

Write method 2: Lock write to ensure synchronization.

AtomicLong wrotePosition = new AtomicLong(0); for(int i=0; i<1024; i++){ final int index = i; executor.execute(()->{ write(new byte[4*1024]); }) } public synchronized void write(byte[] data){ fileChannel.write(ByteBuffer.wrap(new byte[4*1024]),wrote.getAndAdd(4*1024)); }Copy the code

The answer is that method two counts as sequential writing and sequential reading. Locking is not a terrible thing for file operations, but not being able to synchronize write/read! One might ask: Isn’t there already a positionLock inside FileChannel to ensure thread-safe writes? Why synchronize yourself? Why would that be fast? My answer in plain English is that multithreading concurrent write without synchronization results in a file void, and its execution order may be

Sequential 1: thread1 write position[0~4096] sequential 2: thread3 write position[8194~12288] sequential 3: thread2 write position[4096~8194]Copy the code

So it’s not exactly sequential. Don’t worry about the performance degradation of locking, though. We’ll cover one optimization below: file sharding to reduce lock collisions in multithreaded reads and writes.

Why sequential reads are faster than random reads? Why is sequential writing faster than random writing? Both comparisons are actually about the same thing: PageCache, which is the layer of cache between the application buffer and the disk file, as mentioned earlier.

PageCache PageCache

Using sequential reads as an example, after the user initiates a Filechannel.read (4KB), two things actually happen

The operating system loads 16kb from disk into PageCache. This is called a prefetch operation. By copying 4kb from PageCache into user memory, we finally get 4kb in user memory. When the user continues to access the following [4KB, 16KB] disk content, it is directly accessed from PageCache. When you need to access 16kb of disk content, is it four disk I/OS or one disk I/o plus four memory I/OS? The answer is obvious, all of this is optimized by PageCache.

Deep thought: When memory is tight, will the allocation of PageCache be affected? How to determine the size of the PageCache, is fixed 16KB? Can I monitor hits for PageCache? In what scenarios does PageCache fail, and if it fails, what remedies do we need?

The logic behind my simple question and answer needs to be examined by the reader:

When memory is tight, PageCache preread will be affected, measured, and did not find the literature support PageCache is dynamic adjustment, can be adjusted through the Linux system parameters, the default is to occupy 20% of the total memory github.com/brendangreg… Github has a tool for monitoring PageCache, which is a very interesting optimization point. If you can’t control the cache with PageCache, how about you do the preread yourself? The principle of sequential writing and sequential reading are the same, are received by PageCache, leave the reader to ponder.

The previous FileChannel example already uses in-heap memory: bytebuffer. allocate(4 * 1024). ByteBuffer provides another way to allocate out-of-heap memory: ByteBuffer. AllocateDirect (4 * 1024). This leads to a series of questions, when should I use in-heap memory and when should I use direct memory?

I’m not going to spend too much time on this, but just compare:

Unsafe. allocateMemory(size) returns direct memory; unsafe.allocateMemory(size) returns direct memory; When the JVM free memory is greater than 1.5 gb, bytebuffer. allocate(900M) errors are reported. The -xx :MaxDirectMemorySize parameter can be used to limit free memory at the JVM level. When DirectByteBuffer is no longer in use, there are internal cleaner hooks. To be on the safe side, manual collection can be considered: ((DirectBuffer) buffer).cleaner().clean(); ** Memory replication ** In-heap memory -> out-of-heap memory -> pageCache out-of-heap memory -> pageCache Some best practices regarding in-heap and out-of-heap memory:

When large chunks of memory need to be allocated, in-heap memory is limited and only out-of-heap memory can be allocated. Off-heap memory is suitable for objects with medium or long life cycles. (If it is an object with a short lifetime, it is reclaimed during YGC, and there is no performance impact of an object with a large memory and a long lifetime on the application in FGC). To flush in-heap memory, you also need to make a copy to out-of-heap memory, which can be seen in detail in the FileChannel implementation source code, as well as in my other article on why the Jdk does this: At the same time, the combination of pool and off-heap memory can be used to reuse off-heap memory for objects with short life cycles but involving I/O operations (Netty uses this method). Use ThreadLocal and ThreadLocal<byte[]> to create out-of-heap memory rather than in-heap memory. When using ThreadLocal<byte[]>, you can create out-of-heap memory rather than in-heap memory. So when you allocate out-of-heap memory, reuse it whenever possible. Black Magic: UNSAFE

public class UnsafeUtil { public static final Unsafe UNSAFE; static { try { Field field = Unsafe.class.getDeclaredField("theUnsafe"); field.setAccessible(true); UNSAFE = (Unsafe) field.get(null); } catch (Exception e) { throw new RuntimeException(e); }}}Copy the code

UNSAFE is the word that allows us to do so many things that we couldn’t even imagine using dark magic, and I want to talk about just one or two things.

Implement direct memory and memory copy:

ByteBuffer buffer = ByteBuffer.allocateDirect(4 * 1024 * 1024);
long addresses = ((DirectBuffer) buffer).address();
byte[] data = new byte[4 * 1024 * 1024];
UNSAFE.copyMemory(data, 16, null, addresses, 4 * 1024 * 1024);
Copy the code

The copyMemory method is used to copy between memory, both in and out of the heap. The 12 parameters are source, 34 are target, and the fifth parameter is the size of the copy. If it is an array of bytes in the heap, pass the first address of the array and the fixed ARRAY_BYTE_BASE_OFFSET constant of 16. For out-of-heap memory, null and the offset of direct memory are passed, which can be obtained by ((DirectBuffer) buffer).address(). Why not just copy, instead of looking UNSAFE? Because it’s fast, of course! The young! Also, MappedByteBuffer can be used to copy a disk, even though UNSAFE.

Looking UNSAFE, for instance, and looking beyond its security concerns, I won’t elaborate on the UNSAFE technology.

We’ve already mentioned that we need to lock write and read for sequential reads and writes, and I’ll emphasize again and again that locking is not a terrible thing. File IO operations are not that multithreaded. But the sequential read and write after the lock must not play full disk IO, now the system of strong CPU can not squeeze it? File partitioning can be used to achieve two things: sequential reads and writes, while reducing lock conflicts.

So the question is, how much is appropriate? More files, less lock conflicts There are too many files, fragmentation is too serious, single file values are too small, the cache is not easy to hit, such a trade off how to balance? No theoretical answer, Benchmark Everything ~

Direct IO linux io linux io

Finally, let’s talk about an IO method that we haven’t talked about before, Direct IO, what, Java and this stuff? Blogger are you lying to me? How did you tell me there were only three IO modes?! Don’t be quick to scold me, this is not exactly the way JAVA native supports it, but it can be done by calling native methods via JNA/JNI. Direct IO bypasses PageCache, but PageCache is a good thing, so why not use it? On closer inspection, there are some situations where Direct IO actually works, and yes, it’s the one we haven’t talked about so much: ** random reads **. When using IO methods such as Filechannel.read () that trigger PageCache prereads, we don’t really want the operating system to do much for us, unless we’re really lucky and random reads hit PageCache, which we can guess. Direct IO, although blanked by Linus, still has value in the case of random reads, reducing Block I/O Layed overhead to Page Cache.

Anyway, how does Java use Direct IO? Are there any restrictions? Java is not currently supported natively, but there are some good people who package Java JNA library, implement Java Direct IO, github address: github.com/smacke/jayd…

int bufferSize = 20 * 1024 * 1024; DirectRandomAccessFile directFile = new DirectRandomAccessFile(new File("dio.data"), "rw", bufferSize); for(int i= 0; i< bufferSize / 4096; i++){ byte[] buffer = new byte[4 * 1024]; directFile.read(buffer); directFile.readFully(buffer); } directFile.close();Copy the code

Note, however, that ** Only Linux supports DIO**! So, boy, it’s time to get your hands dirty and install Linux. It’s worth mentioning that Direct IO will get native support after THE RELEASE of Jdk10, so let’s see!

To sum up, all the above are accumulated experience from personal practice. Some conclusions are not supported by literature, so welcome to correct any mistakes. About PolarDB stage for the performance analysis of data, such as the semi finished I’ll start a new article specifically, under the analysis of specific on how to use the optimization point, of course, these tips are a lot of people know, decided to the final result is the overall design of architecture, as well as to the file IO, operating system, file system, CPU and the understanding of language features. JAVA performance challenges aren’t exactly popular, but they’re still fun, and I hope this knowledge of file IO will help you in your next contest