This article takes an in-depth look at file descriptors and looks at the use of file descriptors in file input/output streams from JDK source code.

In order to avoid repeated wheel construction, some content and pictures are extracted from the reference materials at the end of the article. This article is for exchange study only. Commercial use is strictly prohibited.

What is the file descriptor?

[1] In Linux, everything can be considered a file, and files can be divided into: normal files, directory files, link files, and device files. A file descriptor is an index created by the kernel to efficiently manage opened files. It is a non-negative integer (usually a small integer) that refers to the open file. All system calls to perform I/O operations go through the file descriptor. When the program starts, 0 is standard input, 1 is standard output, and 2 is standard error. If you open a new file at this point, its file descriptor will be 3. POSIX standards require that every time a file (including socket) is opened, the smallest file descriptor number available in the current process must be used. Therefore, crosstalk can occur if you are careless during network communication. The standard file descriptor graph is as follows:

File descriptors in Linux processes

[2] The data structure of Linux processes also gives an indication:

struct task_struct {
    // Process status
    long              state;
    // Virtual memory structure
    struct mm_struct  *mm;
    / / process
    pid_t             pid;
    // A pointer to the parent process
    struct task_struct __rcu  *parent;
    // List of child processes
    struct list_head        children;
    // A pointer to file system information
    struct fs_struct        *fs;
    // An array containing Pointers to open files for the process
    struct files_struct     *files;
};
Copy the code

The files pointer points to an array that holds Pointers to all open files for the process. When each process is created, the first three digits of files are filled with default values, pointing to the standard input stream, standard output stream, and standard error stream respectively. By default, the program’s file descriptor is 0 for input, 1 for output, and 2 for error. As shown below:

So, in Linux, redirects, pipes, etc., just change the orientation of the first three digits of the process’s files array.” The “everything is file” design idea makes these operations very elegant.

FileDescriptor

FileDescriptor is an abstraction of file descriptors in the JVM. Take a look inside:

public final class FileDescriptor {
    // File descriptor (files array subscript)
    private int fd;
    // The instance to which the file descriptor is associated (usually an input-output stream instance, such as FileInputStream)
    private Closeable parent;
    private List<Closeable> otherParents;

    private boolean closed;
    // Standard input
    public static final FileDescriptor in = new FileDescriptor(0);
    // Standard output
    public static final FileDescriptor out = new FileDescriptor(1);
    // Standard error
    public static final FileDescriptor err = new FileDescriptor(2);

    public boolean valid(a) {
        returnfd ! = -1;
    }

    /* This routine initializes JNI field offsets for the class */
    private static native void initIDs(a);

    static{ initIDs(); }}Copy the code

FileDescriptor is very clear, and we can immediately summarize the following points:

  • The FileDescriptor corresponds to the FileDescriptor one by one, using the fd field to save the FileDescriptor.
  • A single FileDescriptor can be associated with multiple Closeable (usually an input-output stream instance, such as a FileInputStream);
  • Inside the FileDescriptor are three public static constants: in, out, and Err, which represent standard input, standard output, and standard error, respectively. These three are usually used in java.lang.System.
  • The file descriptor fd is usually non-negative;

initIDs

The initIDs method is used to initialize the ID of the FD field (which I think can be interpreted as a pointer to the FD field). This is a native method, and the corresponding JNI implementation can be found in JDK source code. Take the implementation under Window (because Window is relatively easy to find = =) :

/* field id for jint 'fd' in java.io.FileDescriptor */
jfieldID IO_fd_fdID;

/* field id for jlong 'handle' in java.io.FileDescriptor */
jfieldID IO_handle_fdID;

/************************************************************** * static methods to store field IDs in initializers */

JNIEXPORT void JNICALL
Java_java_io_FileDescriptor_initIDs(JNIEnv *env, jclass fdClass) {
    CHECK_NULL(IO_fd_fdID = (*env)->GetFieldID(env, fdClass, "fd"."I"));
    CHECK_NULL(IO_handle_fdID = (*env)->GetFieldID(env, fdClass, "handle"."J"));
}
Copy the code

You can see that the field ID of the FD is stored in the global field, and subsequent code can change the value of the FD based on its field ID. The field ID here I understand as a pointer. It also handles the Handle field, which I don’t see in the JDK, probably due to version issues.

File input/output streams

Java IO has two file input and output streams: FileInputStream and FileOutputStream. The class diagram inheritance structure of both is very clear:

Since the implementation principle of the two is similar, the following discussion is based on FileInputStream.

FileInputStream

Take a look at the source of FileInputStream.

Internal fields are rarely terse, see comments:

public
class FileInputStream extends InputStream/* File descriptor object **/private final FileDescriptor fd;
    /** File path **/
    private final String path;
    /** can operate the file read and write channel **/
    private FileChannel channel = null;
    /** The lock object for concurrency control when closed **/
    private final Object closeLock = new Object();
    private volatile boolean closed = false;
}
Copy the code

There is also an initIDs method similar to FileDescriptor, which is used to set the internal FD field ID:

private static native void initIDs(a);

private native void close0(a) throws IOException;

static {
    initIDs();
}
Copy the code

The constructor is also quite simple. The key is to look at this constructor:

public FileInputStream(File file) throws FileNotFoundException { String name = (file ! =null ? file.getPath() : null);
    SecurityManager security = System.getSecurityManager();
    if(security ! =null) {
        security.checkRead(name);
    }
    if (name == null) {
        throw new NullPointerException();
    }
    if (file.isInvalid()) {
        throw new FileNotFoundException("Invalid file path");
    }
    // Create a file descriptor object
    fd = new FileDescriptor();
    // Associate the current FileInputStream with the file descriptor
    fd.attach(this);
    path = name;
    // Open the file
    open(name);
}

Copy the code

The constructor creates a new file descriptor object fd. Remember that this object also has an internal fd field of type long, initialized to 0L by default. The initialization logic for the FD field is in the open method. The native method is eventually called:

/**
    * Opens the specified file for reading.
    * @param name the name of the file
    */
private native void open0(String name) throws FileNotFoundException;
Copy the code

As an aside, here’s how to find the corresponding C++ implementation using this native method:

  • Download the corresponding JDK source code;
  • Find JDK/SRC/share/classes/Java/IO/FileInputStream. Java;
  • To generate the Header file java_io_fileInputStream.h, run javah java.io.FileInputStream:
/* * Class: java_io_FileInputStream * Method: open0 * Signature: (Ljava/lang/String;) V */
JNIEXPORT void JNICALL Java_java_io_FileInputStream_open0 (JNIEnv *, jobject, jstring);
Copy the code
  • A search using the C++ method name Java_java_io_FileInputStream_open0 will quickly find the corresponding C++ method implementation.

We view the JDK source code (JDK/SRC/share/native/Java/IO/FileInputStream. C), have the following code:

jfieldID fis_fd; /* id for jobject 'fd' in java.io.FileInputStream */

/************************************************************** * static methods to store field ID's in initializers */

JNIEXPORT void JNICALL
Java_java_io_FileInputStream_initIDs(JNIEnv *env, jclass fdClass) {
    fis_fd = (*env)->GetFieldID(env, fdClass, "fd"."Ljava/io/FileDescriptor;");
}

/************************************************************** * Input stream */

JNIEXPORT void JNICALL
Java_java_io_FileInputStream_open0(JNIEnv *env, jobject this, jstring path) {
    fileOpen(env, this, path, fis_fd, O_RDONLY);
}

Copy the code

Here you can see the initIDs method implementation. The logic is to save the field ID of the FileDescriptor object named “fd” to the global variable fid;

The Open0 method calls fileOpen. There are different implementations on different operating systems. Take Windows as an example:

void
fileOpen(JNIEnv *env, jobject this, jstring path, jfieldID fid, int flags)
{
    FD h = winFileHandleOpen(env, path, flags);
    if (h >= 0) {
        SET_FD(this, h, fid); }}Copy the code

We are not interested in expanding the winFileHandleOpen method, but we call the Windows system method to open the file and return the file descriptor H.

The file descriptor H is then set to FID by calling the SET_FD method. Fid is the field ID (understood as a pointer to the FD field) cached by initIDs.

At this point, FileInputStream is associated with the file descriptor. Subsequent reads and writes on FileInputStream can be easily found by the JVM using its internal FD field! Therefore, FileInputStream also supports specifying the constructor form of the file descriptor:

FileOutputStream fdOut = new FileOutputStream(FileDescriptor.out);
Copy the code

This is the implementation of system.out.

conclusion

We studied the concept of file operators for Linux and understood the design concept of “Everything is a file”. In addition, I studied in depth the source code implementation of the file I/O stream classes and explored how they interact with the operating system using file descriptors. Hope you have a good harvest!

The resources

In particular, some paragraphs of this article are taken from the following information.

  • [1] “Progress every day: The Relationship between File descriptors and open files in Linux”
  • [2] “Basic Principles of Linux Process, Thread, and File Descriptors”

Other references

  • Java I/o stream FileDescriptor
  • File descriptors (Descriptor) ¶
  • How to find native implementations in the JDK
  • —– In-depth analysis of Java IO (3) : Introduction to the Java IO class library