This article includes:

  • The importance of file-oriented programming
  • Simple file read and write
  • Random access file reads and writes
  • NIOFile read and write –FileChannel
  • useMappedByteBufferRead and write files

The importance of file-oriented programming

In my mind, there seem to be very few interview questions about file manipulation, and most of the interview questions are around high concurrency, network programming, RPC, databases, but it’s just as important to know how to handle file manipulation. However, we rarely encounter the need to manipulate files. After all, 90 percent of the work is done by operating databases, network communication, and storage has been replaced by various relational databases, distributed databases, caches, search engines, and even cloud storage.

Although occasionally we also need to achieve the file upload interface, but the file upload will generally choose to store to the cloud, at most temporary transfer for a while, I believe that many people will choose to directly Copy a code baidu complete file storage, and even do not care about how to achieve. Most table exports rely on existing frameworks, making file-oriented programming less important.

If we look at the underlying source code of some of the frameworks, we can see that mastering file manipulation is also important. RocketMQ, for example, does not borrow a database or any other third-party framework for its message storage, just file storage. I was curious why there were no interview questions about how RocketMQ’s message storage implementation works.

I have also developed some components/frameworks/middleware myself, but due to the lack of knowledge of file operation, I first thought of relying on some third-party storage middleware/libraries, such as Redis, Mysql and LevelDB, which directly increased the cost of using the framework. So I’ve always known the importance of mastering file manipulation.

Sometimes I wonder why deploying Kafka (older version) deployed a Zookeeper when Zookeeper was only used for managing nodes, consumers, and Leader elections. Deploying Zookeeper Requires deploying several nodes to ensure Zookeeper availability, which increases the cost of Kafka. Therefore, when I saw Alibaba Sentinel implement the cluster flow limiting function and provide embedded mode, I understood why both embedded deployment and independent deployment modes were provided.

I started to develop a distributed delay scheduling middleware last year. In fact, the core functions have been realized long ago, and the business functions have been supported in the project in the way of embedded deployment. However, in order to remove the reliance on Redis for storage function, third-party framework for RPC function, and broadcast mechanism for Leader election function, I decided to write a new one. Therefore, I used Raft consensus algorithm +LevelDB (Key-value repository) to replace Redis for storage, based on Netty’s own ENCAPSULATION RPC framework, and based on Raft algorithm to replace broadcast for Leader election. This directly reduces the cost of using this self-developed middleware. When IMPLEMENTING the journal Appender for Raft, I hit the same hurdle again, but this time I chose to cross it.

Among many projects of Ali open source, RocketMQ message storage uses file storage, and Sentinel storage resource index data statistics also uses file storage. The two frameworks use the same design idea in storage implementation, that is, data file + index file. In my self-developed distributed delay scheduling middleware, I refer to the design of file storage index in RocketMQ and Sentinel. Data files store diaries, while index files store diary IDS and physical offsets of diaries in data files.

Sentinel stores indexes with timestamps accurate to the second, and the timestamps grow in order, and the timestamps are 8 bytes long. Depending on the maximum size of a single index file, the physical offset can also be exactly long, so each index is 16 bytes long. Resource indicator data may not be recorded due to no request or application restart for a certain period of time, but at least the timestamp is monotonically increasing, so we can quickly locate the index with a simple half-cut lookup. Since Sentinel resource indicator data collection does not need to consider high concurrency, such a design is sufficient to meet the requirements.

RocketMQ needs to provide the ability to query messages by key or time interval, so RocketMQ’s index store implementation is more difficult than Sentinel. A single index fixed file size is about 400 m, an index file can save 2000 w index, index file of the underlying storage design quite so HashMap structure in the file system, each file head store this file stored messages, minimum and maximum timestamp, timestamp that used to implement the search by time interval message record. The hash value of the message key is stored as the physical offset of the index entry in the index file, plus, of course, the size of the file header, and the number of bytes multiplied by the single index.

You might think that RocketMQ and Sentinel indexing is all about algorithms, but algorithms are the soul. But the software still depends on the support of the hardware, have you considered, how to jump to a certain location in the file to read the specified byte of data, and how to rewrite the data in the specified location in the file? How do you consider concurrent reads and writes and how do you tune performance? How does the NIO of files understand, how to use MappedByteBuffer to improve performance, and how does it work?

The interpretation of these questions is the purpose of my study of file reading and writing, but also reflects the importance of mastering file-oriented programming. Although we do not need to repeat the wheel in order to improve work efficiency and reduce the error rate, repeating the wheel is undoubtedly the most efficient way to improve their ability to learn.

Simple file read and write

Since it is easy to create and delete files and directories, we will ignore this part and focus on reading and writing files.

FileOutputStream

Because streams are one-way, FileOutputStream can be used for simple file writing and FileInputStream for reading.

Any data output to a file is output in bytes, including images, audio, and video. In the case of images, if there is no image format parser, then image files are really just bytes of data stored in a certain format.

FileOutputStream is a byte output stream used to output byte data to a file. It can be written sequentially or appending, but cannot be written at a specified location.

Example code for opening a file output stream and writing data is as follows.

public class FileOutputStreamStu{
    public void testWrite(byte[] data) throws IOException {                    
        try(FileOutputStream fos = new FileOutputStream("/tmp/test.file".true)) { fos.write(data); fos.flush(); }}}Copy the code

Note that a new FileOutputStream will cause the contents of the file to be emptied if the stream is not appended, and the default constructor of FileOutputStream opens the stream in non-appended mode.

FileOutputStream takes argument 1 to the file name and argument 2 to whether to open the stream in append mode. If true, bytes are written to the end of the file instead of the beginning.

The purpose of the flush method is to flush the buffer before the stream is closed. The actual use of FileOutputStream does not require the flush method. In this case, it refers to writing data cached in JVM memory to the system function write. For example, when BufferedOutputStream is called, if the BufferedOutputStream method is not cached, the system function write is not actually called, as shown below.

public class BufferedOutputStream extends FilterOutputStream {
    public synchronized void write(byte b[], int off, int len) throws IOException {
        if (len >= buf.length) {
            flushBuffer();
            out.write(b, off, len);
            return;
        }
        if (len > buf.length - count) {
            flushBuffer();
        }
        System.arraycopy(b, off, buf, count, len); // Write only to cachecount += len; }}Copy the code

FileInputStream

FileInputStream is a byte input stream of a file. It is used to read byte data in a file into the memory. The data can be read sequentially rather than jumping.

The case code for opening a file input stream to read data is as follows.

public class FileInputStreamStu{
    public void testRead(a) throws IOException {    
    	try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) {
        	byte[] buf = new byte[1024];
        	intrealReadLength = fis.read(buf); }}}Copy the code

The bytes in the BUF array with subscripts from 0 to realReadLength are the data actually read. If realReadLength returns -1, it indicates that the end of the file has been read and no data has been read.

Of course, we can also read byte by byte as shown in the following code.

public class FileInputStreamStu{
    public void testRead(a) throws IOException {     
        try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) {
            int byteData = fis.read(); // Returned value Value range: [-1,255]
            if (byteData == -1) {
                return; // Read at the end of the file
            }
            byte data = (byte) byteData;
            // data indicates the bytes of data read}}}Copy the code

How you use the bytes you read depends on what data you store in your file.

If the entire file stores an image, the entire file needs to be read and parsed into an image according to the format. If the entire file is a configuration file, the entire file can be read line by line. If the \ N newline character is encountered, the line is read as follows.

public class FileInputStreamStu{
    @Test
    public void testRead(a) throws IOException {
        try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) {
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            int byteData;
            while((byteData = fis.read()) ! = -1) {
                if (byteData == '\n') {
                    buffer.flip();
                    String line = new String(buffer.array(), buffer.position(), buffer.limit());
                    System.out.println(line);
                    buffer.clear();
                    continue;
                }
                buffer.put((byte) byteData); }}}}Copy the code

Java provides many apis for reading and writing files based on InputStream and OutputStream, such as BufferedReader, but if you are too lazy to remember these apis, just remember FileInputStream and FileOutputStream.

Random access file reads and writes

RandomAccessFile is a combination of FileInputStream and FileOutputStream. It can be read or written, and RandomAccessFile can be moved to the specified location of the file to start reading or writing.

RandomAccessFile is used as follows.

public class RandomAccessFileStu{
    public void testRandomWrite(long index,long offset){
        try (RandomAccessFile randomAccessFile = new RandomAccessFile("/tmp/test.idx"."rw")) { randomAccessFile.seek(index * indexLength()); randomAccessFile.write(toByte(index)); randomAccessFile.write(toByte(offset)); }}}Copy the code
  • RandomAccessFileConstruction method: parameters1Is the file path and parameter2As the model,'r'To read it,'w'To write;
  • seekMethods:linux,unixOperating system is called under the systemlseekFunction.

RandomAccessFile seek method by calling native method implementation, source code as follows.

JNIEXPORT void JNICALL
Java_java_io_RandomAccessFile_seek0(JNIEnv *env,
                    jobject this, jlong pos) {
    FD fd;
    fd = GET_FD(this, raf_fd);
    if (fd == - 1) {
        JNU_ThrowIOException(env, "Stream Closed");
        return;
    }
    if (pos < jlong_zero) {
        JNU_ThrowIOException(env, "Negative seek offset");
    }
    // #define IO_Lseek lseek
    else if (IO_Lseek(fd, pos, SEEK_SET) == - 1) {
        JNU_ThrowIOExceptionWithLastError(env, "Seek failed"); }}Copy the code

The Java_java_io_RandomAccessFile_seek0 function takes argument 1 to represent the RandomAccessFile object and argument 2 to represent the offset. The IO_Lseek method called in the function is actually the OPERATING system lseek method.

The reads, writes, and offsets provided by RandomAccessFile are all done by calling operating system functions, including the file input and file output streams described earlier.

NIO file read and write -FileChannel

A Channel represents an open connection between an IO source and a target. A Channel is similar to a traditional stream, but a Channel itself cannot access data directly and can only interact with buffers. A Channel is used to transmit data from one side of the buffer to entities (such as files and sockets) on the other side. It supports bidirectional transmission.

Just as SocketChannel is the channel through which clients communicate with servers, FileChannel is the channel through which we read and write files. A FileChannel is thread-safe, meaning that a FileChannel can be used by multiple threads. For multithreaded operations, only one thread can modify the file in which the channel resides at a time. If you want to ensure multithreaded write order, you have to queue write.

A FileChannel can be obtained with FileOutputStream, FileInputStream, RandomAccessFile, or a channel can be opened with the FileChannel#open method.

For example, to get a FileChannel from FileOutputStream, you can get a FileChannel from FileOutputStream or RandomAccessFile in the same way.

public class FileChannelStu{
    public void testGetFileCahnnel(a){
        try(FileOutputStream fos = new FileOutputStream("/tmp/test.log");
            FileChannel fileChannel = fos.getChannel()){
           // do....   
        }catch (IOException exception){
        }
    }
}
Copy the code

Note that FileChannel obtained via FileOutputStream can only perform write operations and FileChannel obtained via FileInputStream can only perform read operations. See the getChannel method source code.

FileChannel opened with FileOutputStream or FileInputStream or RandomAccessFile will also be closed when the stream is closed.

To get a FileChannel that supports both reading and writing, open it with the open method, as shown below.

public class FileChannelStu{
    public void testOpenFileCahnnel(a){
        FileChannel channel = FileChannel.open(
                            Paths.get(URI.create("file:" + rootPath + "/" + postion.fileName)),
                            StandardOpenOption.READ,StandardOpenOption.WRITE);
        // do....channel.close(); }}Copy the code

The second variable length argument to the open method is passed standardOpenOption. READ and StandardOpenOption.WRITE to open a two-way READ/WRITE channel.

FileChannel allows files to be locked at the process level, not the thread level. File locks allow multiple processes to concurrently access and modify the same file. The file lock is held by the current process, and once the file lock is acquired, a release call is called to release the lock, and the lock is automatically released when the corresponding FileChannel object is closed or when the current JVM process exits.

The file lock use case code is as follows.

public class FileChannelStu{
    public void testFileLock(a){
        FileChannel channel = this.channel;
        FileLock fileLock = null;
        try {
            fileLock = channel.lock();// Get the file lock
            // Perform a write operationchannel.write(...) ; channel.write(...) ; }finally {
            if(fileLock ! =null) {
                fileLock.release(); // Release the file lock}}}}Copy the code

Of course, as long as we can ensure that only one process writes to the file at the same time, there is no need to lock the file. RocketMQ also does not use file locks because each Broker has its own data directory, and even if multiple brokers are deployed on a single machine, there will never be multiple processes working on the same diary file.

After removing the file lock in the above example, the code is as follows.

public class FileChannelStu{
    public void testWrite(a){
        FileChannel channel = this.channel; channel.write(...) ; channel.write(...) ; }}Copy the code

There is also a problem with concurrent writing of data. Although FileChannel is thread-safe, two writes are not atomic and must be locked to ensure that two writes are written consecutively. In RocketMQ, locks are replaced by reference counters.

The force method provided by FileChannel is used to flush disks by calling the operating system’s fsync function as follows.

public class FileChannelStu{
    public void closeChannel(a){
        this.channel.force(true);
        this.channel.close(); }}Copy the code

The parameter of the force method indicates whether changes to file metadata are also forced to be written in addition to content changes. If MappedByteBuffer is used, the force method of MappedByteBuffer can be used directly.

The FileChannel force method eventually calls the C method.

JNIEXPORT jint JNICALL
Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this,
                                          jobject fdo, jboolean md)
{
    jint fd = fdval(env, fdo);
    int result = 0;
    if (md == JNI_FALSE) {
        result = fdatasync(fd);
    } else {
        result = fsync(fd);
    }
    return handle(env, result, "Force failed");
}
Copy the code

The md parameter corresponds to the metaData parameter passed by calling the force method.

Use FileChannel to support seek (Position) to read or write data to a specified location, as shown below.

public class FileChannelStu{
    public void testSeekWrite(a){
        FileChannel channel = this.channel;
        synchronized (channel) { 
            channel.position(100); channel.write(ByteBuffer.wrap(toByte(index))); channel.write(ByteBuffer.wrap(toByte(offset))); }}}Copy the code

The above example moves the pointer to the physical offset of 100 bytes and writes index and offset sequentially. Similarly, the code is as follows.

public class FileChannelStu{
    public void testSeekRead(a){
        FileChannel channel = this.channel;
        synchronized (channel) { 
            channel.position(100);
            ByteBuffer buffer = ByteBuffer.allocate(16);
            int realReadLength = channel.read(buffer); 
            if(realReadLength==16) {long index = buffer.getLong();
                longoffset = buffer.getLong(); }}}}Copy the code

The read method returns the number of bytes actually read. If -1 is returned, the file is at the end of the file and there is nothing left to read.

Read and write files using MappedByteBuffer

MappedByteBuffer is a file reading and writing API provided by Java based on the virtual memory mapping (MMAP) technology of the operating system. At the bottom layer, files are no longer read, write, seek and other system calls.

We need to map an area of the file into memory using the FileChannel#map method, as follows.

public class MappedByteBufferStu{
  @Test
  public void testMappedByteBuffer(a) throws IOException {
      FileChannel fileChannel = FileChannel.open(Paths.get(URI.create("file:/tmp/test/test.log")),
                StandardOpenOption.WRITE, StandardOpenOption.READ);
      MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0.4096);
      fileChannel.close();
      mappedByteBuffer.position(1024);
      mappedByteBuffer.putLong(10000L); mappedByteBuffer.force(); }}Copy the code

The MappedByteBuffer is returned by calling The FileChannel map method. The channel is closed after the mapping, and an 8-byte integer of type long is written at the specified location. Finally, the force method is called to write the data from memory back to disk (flush).

Once a map is created, it does not depend on the file channel used to create it, so after creating the MappedByteBuffer we can close the channel without affecting the validity of the map.

Actually mapping files to memory is more expensive than reading or writing tens of kilobytes of data through the read and write system call methods. From a performance perspective, MappedByteBuffer is suitable for mapping large files into memory, such as hundreds of megabytes of large files.

The FileChannel map method takes three arguments:

  • MapMode: Mapping mode, possible values areREAD_ONLY(Read-only mapping),READ_WRITE(Read/write mapping),PRIVATE(Private mapping),READ_ONLYOnly read is supported,READ_WRITESupports reading and writing, whilePRIVATEIt can only be modified in memory and cannot be written back to disk.
  • positionandsize: Mapping area, which can be the entire file or a part of the file, in bytes.

Note that if FileChannel is read-only, the mapping mode of the map method cannot be specified as READ_WRITE. If the file is newly created, the file size will change to (0+position+size) as soon as the mapping succeeds.

The following is an example of reading data from MappedByteBuffer:

public class MappedByteBufferStu{
    @Test
    public void testMappedByteBufferOnlyRead(a) throws IOException {
        FileChannel fileChannel = FileChannel.open(Paths.get(URI.create("file:/tmp/test/test.log")),
                    StandardOpenOption.READ);
        MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0.4096);
        fileChannel.close();
        mappedByteBuffer.position(1024);
        longvalue = mappedByteBuffer.getLong(); System.out.println(value); }}Copy the code

Mmap bypasses read and write system function calls and a zero copy of data from kernel space to user space. MappedByteBuffer uses direct memory instead of JVM heap memory.

Mmap only allocates address space in virtual memory, and only allocates physical memory on the first access to virtual memory. After MMAP, the file content is not loaded onto the physical page, but the address space is allocated in the virtual memory. When the process accesses this address, it finds that the corresponding page of the virtual memory is not cached in the physical memory by searching the page table, and the page missing interrupt is generated, which is processed by the kernel’s page missing exception handler. Load the corresponding contents of the file into physical memory in page size (4096).

Since the physical memory is limited, when mMAP writes more data than the physical memory, the operating system will carry out page replacement. According to the elimination algorithm, the pages to be eliminated will be replaced with new pages. Therefore, the memory corresponding to MMAP can be eliminated. The operating system writes the data back to disk before eliminating the page.

The data writing process is as follows:

  • 1. Write the data to the corresponding virtual memory address.
  • 2. If the corresponding virtual memory address does not correspond to the physical memory, the page missing interrupt is generated, and the kernel loads the page data into the physical memory.
  • 3. Data is written to physical memory corresponding to virtual memory;
  • 4. Dirty pages are written back to disk by the operating system in the event of a page cull or flush.

RocketMQ uses MappedByteBuffer to read and write index files and implement a filesystem based HashMap.

RocketMQ creates a new CommitLog file and fetches the MappedByteBuffer via FileChannel with a warm-up operation that writes four bytes of 0x00 to each virtual memory Page Cache and forces a flush to write the data to the file. This action is used to load the entire MMAP map into physical memory via read/write operations. In addition, a memory lock operation was also done after the warm-up, which is to avoid disk swap, prevent the operating system from temporarily saving the warm-up page to swap area, and prevent the program from reading the data page out of the swap again to produce a page missing interruption.

reference

[Simple Linux] about mmap parsing