Node.js provides the File System API, you can read and write files, directories, modify permissions, create soft links, and so on.

You may be familiar with the apis, but you may not understand how they work. To really understand File System, you have to look at the roots.

Let’s try designing a file system from 0 to 1.

Design a file system from 0 to 1

What is a file?

Such a relatively complete piece of information is a document.

But the computer’s persistent storage is on hard drives, mainly disks and SSDS:

A file in a computer is a logical concept, not a physical entity.

So if given such a hard disk, let us build a file system to achieve the function of the file. So how do we do that?

A quick thought: next to each other.

Then you also need to record the index, which files are where:

This is fine, but there is a problem. If file B is deleted, the corresponding space should be freed:

And then a file D comes in, and it doesn’t fit, it’s too small.

The space between A and B is A fragment, which breaks up the available disk space into discrete chunks.

What to do? How to make better use of disk space?

Block! The file is divided into small pieces, such as 1K for a block, can not be continuous storage, all the blocks in the index on the line.

The index is not clear, zoom in to see:

The index of each file records which blocks the data is stored in, so that it can be read out piece by piece.

And once the files are deleted, those blocks can continue to be used and there won’t be large fragments.

Wonderful!

No wonder the first step in file systems is chunking, and the first step in memory management is paging, so that you can use space efficiently.

This type of index node can be called an index node or inode for short.

Moreover, in addition to the file name and stored blocks, other information can be recorded, such as when it was created, when it was modified, file permissions, who it belongs to, and so on.

A file is the information that the inode records and a series of data blocks that it indexes.

So how do I know that I wrote a file using a block and deleted a file freeing a block? You have to record the block state separately.

A data block has only two states: idle and occupied, and a binary bit is enough. The idle state of all blocks is recorded by a binary number. A bit represents a value, which is called a bitmap. In this case, a block bitmap.

Most hard disks are data blocks. The first segment contains the block where the inode resides and the block bitmap of the data block.

Inodes can also be stored in blocks. For example, inodes can only be stored in 5 blocks, so inodes are limited in number.

We also record the idle state of all inodes, which is the inode bitmap.

A quick overview of the file system we designed:

In order to make better use of disk space, we divide the data into blocks, and which blocks are used for each file are recorded in the inode. The inode also records the creation time, modification time, and permission of a file.

The block bitmap records the idle state of data blocks, and the inode bitmap records the idle state of inodes.

But what if I want to know how many blocks I have in my hard drive, how many inodes I have, and how many I use?

Simple, traversal block bitmap and inode bitmap, know the number.

But each time this calculation is too slow, it is just like when we design the database, how many posts under a forum, this data will not be used to query SQL every time, but when the posts are added and deleted, dynamic maintenance of a field in the database table, directly query.

Let’s also design a block to store this statistical information, namely:

This block is a higher level of statistics, we can call it a superblock.

Now we give our designed file system to the user, and we can use it efficiently.

Release: Shenguang file system V1.0.

But right now our file system doesn’t seem to be working very well and it can only create files, so what if I create 1000 files?

Queries are slow and prone to name conflicts.

What to do?

Namespace! Directory! It’s like the idea of a folder.

So how do you do that?

Each inode is a file, so organize the inode into a tree.

Let’s say we have two files B and C:

Let’s create A directory A:

Add an isDirectory attribute to the inode. If the inode is a directory, read the contents of the data block and find the inode node number.

This is how directories are implemented: the isDirectory attribute of the inode distinguishes files from directories. If the inode is a directory, the inode information in the data block is read to find subfiles. If the inode is a file, the data block is read directly as the file content.

The order of lookup from one inode to another is called a path.

For example, the inode lookup order of a file looks like this:

The file path is /A/D/dongdong.jpg

This is the essence of the file path: the file path is the inode lookup order

We now support directory nesting, files, directories can be organized into a tree structure for easy management.

Release version: Shenguang file system V2.0.

Now an inode has only one path, because it’s a tree. What if I want both paths to find the same inode?

For example, /A/D/dongdong.jpg can be accessed:

/A/B/dongdong.jpg

Just point to it.

There are indeed two paths to the same file, and this extra link is what we call a hard link.

But because a node has two parents, it is no longer a tree, but a graph. So, the idea of a file tree is technically problematic, maybe a file graph.

But what if I want to change the name to dongdong.jpg and call it dongdong2.jpg?

Now they’re all the same inode node, so they all change. But I just want to change the file name of /A/B’s path, nothing else.

Create another inode to rename the inode and point to dongdong.jpg.

Instead of pointing directly to the past, with an inode to rename it, and then pointing to the past, we call this soft link.

Why is it called hard? Because you can’t change it, it points directly to the same inode.

Why is it called soft? Because it can be changed, there is an extra layer of inode to change the name.

So we call them hard and soft links.

Hard link and soft link are used to search for the same file in multiple paths. However, hard link cannot be renamed separately. Soft link can.

The implementation of Monorepo is based on soft connection, can point to the same directory inode, and can give individual names.

Achieved hard and soft links, you can send a new version.

Release version: Shenguang file system V3.0.

Let’s review the file system we designed:

V1.0:

Data blocks are used to store file contents, improving disk utilization efficiency

The inode records the data blocks used and the creation time and permissions of files

The idle state of a data block is recorded by a block bitmap

The inode bitmap records the inode idle state.

Records inodes and data block statistics through superblocks.

This version implements file access, but does not support directories.

V2.0:

Add an attribute to the inode to record whether it is a file or directory

The data block of a directory stores the inode information of a specific file list. The file list can be read when the directory is read.

The hierarchy of directory inodes and file inodes is called the file path.

This version implements directory and path functionality.

V3.0:

A hard link enables multiple paths to search for the same file by including the same inode in multiple directory inodes.

The directory creates an inode for the name change, and the inode points to the target inode, which is called a soft link.

This version of the implementation of multiple paths to find a unified file soft and hard link function.

Real file systems are similar, and there are many file systems, such as ext2 and FAT, that are similar to our design.

File System design done, back to the original goal, we want to understand node.js File System API. Let’s take a look.

Node.js file system API

Node.js uses V8 to inject the fs API to js, and at the bottom is to call the operating system’s file system function through c++, that is, the file system we designed above.

The FS API that we call ends up calling the operating system’s file system functionality.

After designing a file system, let’s look at the FS API and see if we can understand it better:

  • Fs. stat retrieves information in the inode
  • Fs. chmod Modifies file permissions and inode information
  • Fs. chown Modifies the owning user and inode information
  • Fs.copyfile copies inodes and data blocks and adds new inodes to the inode contents of the corresponding directory
  • Fs. link creates soft and hard links, that is, multiple paths looking for the same file inode, soft links can also be renamed
  • Fs. mkdir Specifies the directory inode that is created
  • Fs.rmdir reads all file inodes contained in the directory inode

There are many more apis, but they are used to manipulate the file system we designed above.

With a root understanding of the file system, these apis are also handy.

conclusion

To really understand the fs module of Node.js, we designed a file system together:

  • Divide files into different data blocks to make efficient use of disk space.

  • The index node of a file records data blocks, creation time, permission, and directory information.

  • Data block idle state is recorded by block bitmap.

  • The inode bitmap records the inode idle state.

  • Records the usage of disk inodes and data blocks by super block.

  • The directory node is realized by the way that the data block corresponding to inode contains the information list of file inode.

We draw some important conclusions:

Files are essentially inode + blocks of data.

A path is essentially a path to find the target inode.

Hard links are essentially multiple directory inodes containing the same inode.

The soft link essentially creates one more inode for renaming, and the corresponding data block points to the target inode.

Node.js’s fs API is a call to v8’s operating system capabilities via c++ injection. It’s easy to learn those apis once you understand the file system.