I’ve written about the source code analysis of Linux file systems before, but analyzing file systems from a source perspective is a bit boring (and not novice friendly), so this time I’ll try to explain the principles of Linux file systems with graphics and graphics rather than diving into source code.
I. Introduction to hard disks
Before introducing file systems, let’s take a look at hard drives.
It is well known that data will be lost when the memory is turned off, so modern computers use hard disks to store data. In other words, data on a hard drive can survive a power outage.
The popular types of hard drives are mechanical hard drives (HDDS) and solid-state drives (SSDS). Since this article focuses on file systems, the hard disk principle will not be explained too much. Here’s a comparison of mechanical drives and solid-state drives:
We can think of a hard disk as a huge array, with each element representing a block of data, as shown below:
In the Linux kernel, each data block is defined as 4KB, so a 128GB disk can be divided into 33,554,432 data blocks. The kernel reads and writes data blocks based on block numbers.
What is a file system
As mentioned earlier, the kernel reads and writes to the hard disk in the form of blocks of data, but this is very unintuitive to humans because it is impossible to remember what data each block holds.
To make it more convenient and intuitive for users, the Linux kernel abstracts two concepts to manage data on hard disks: files and directories.
-
File: Used to save data.
-
Directory: Used to save a list of files, of course, directories can also save directories.
Data is stored in disk data blocks, so the file only needs to record which data blocks belong to the current file. As shown below:
As you can see from the figure above, directories can hold both files and directories. The data block number of the current file is saved in the file. Therefore, you only need to find the corresponding data block for reading and writing files.
MINIX file system implementation
Now, let’s use the MINIX file system to explain the design principles of the file system in detail. Because the MINIX file system is so simple, it is suitable for educational use.
1. MINIX files and directories
In the MINIX file system, a file is described as a minix2_inode object. Let’s look at the definition of minix2_inode:
struct minix2_inode { __u16 i_mode; // mode __u16i_nlinks; // link count __u16i_uid; // User UID __u16 i_gid; // Group ID __u32i_size; // File size __u32i_atime; // Access time __u32i_mtime; // Change time __u32i_ctime; // Create time __u32i_zone [10]; // The number of files corresponds to the data block number};Copy the code
We need to pay special attention to the i_zone field of the minix2_inode object, which is used to record the block number belonging to the current file. By definition, i_zone is an integer array of 10 elements. Does this mean that MINIX files can only hold 40 KB of data?
The answer is no, because the MINIX file system divides the i_zone array into four parts: The first seven elements point directly to the block number where the data is stored, that is, the data will be stored directly on these blocks, while the eighth element is a first-level indirection, the ninth element is a second-level indirection, and the tenth element is a third-level indirection. Let’s look at the following diagram to illustrate this relationship:
With this multi-point approach, a single MINIX file can hold more than 40KB of data.
There are objects that describe files, so there should be objects that describe directories, too? In the MINIX file system, directories are also described using minix2_inode objects. So how do you tell a file from a directory?
The minix2_inode object has a field named i_mode, which holds the corresponding type of minix2_inode. Normal files are represented by the S_IFREG flag and directories by S_IFDIR. So in essence, a directory is a special kind of file.
Ordinary file data blocks store file data, but directory data blocks store what? The answer is a list of files, each of which is represented by a minix_dir_entry object, defined as follows:
struct minix_dir_entry {
__u16 inode;
char name[0];
};
Copy the code
-
Inode: The index of the inode array of the minix2_inode object corresponding to the current file. We can ignore this field for the moment, as described below.
-
Name: Is used to record the name of the current file. Since the length of the file name is not fixed, a flexible array (variable size data) is used here.
Let’s look at the following figure to show the difference between the data content pointed to by files and directories:
The figure above illustrates two distinct differences between files and directories:
-
The i_mode field of the file is set to S_IFREG, while the i_mode field of the directory is set to S_IFDIR.
-
The i_zone field of a file points to a data block that stores file data, while the i_zone field of a directory points to a data block that stores a list of files.
2. Format the MINIX file system
Now that we have a basic understanding of how the MINIX file system stores files and directories, we will look at how the MINIX file system manages files and directories on your hard disk, also known as formatting.
As mentioned earlier, we can think of the hard disk as a large array of data blocks, and the MINIX file system will divide the hard disk into the following parts, as shown below:
Let’s explain these parts below:
-
Boot block: Occupies a block of data used for operating system startup, which we can ignore.
-
Super block: a file system occupies a data block and stores information about the file system. A MINIX file system uses the miniX_super_block object to store information about the file system. For example, the inode bitmap occupies several data blocks and the bitmap occupies several data blocks.
-
Inode bitmap: Blocks of data that describe which members of the inode table have been used. Each bit represents the usage of an inode.
-
Block bitmap: Blocks that describe which members of a block list have been used. Each bit represents the usage of a block.
-
Inode table: Contains several data blocks and consists of multiple minix2_inode objects. Each minix2_inode object represents a file or directory.
-
Data block list: Occupies several data blocks for storing file data.
This is the format structure of the MINIX file system on the hard disk. Let’s first look at the information recorded by the super block, which is represented by the minix_super_block object and is defined as follows:
struct minix_super_block { __u16 s_ninodes; // the number of elements in the inode table __u16s_nzones; // The number of elements in the data block list (v1 version) __u16s_imap_blocks; // The number of bitmaps in the inode __u16s_zmap_blocks; __u16s_firstDatazone; // Start number of the first data block __u16s_log_zone_size; __u32 s_max_size; // Maximum file size __u16s_magic; // Magic number (used to identify the MINIX file system) __u16s_state; // File system state __u32s_zones; // The number of elements in the block list (version V2)};Copy the code
Minix_super_block The purpose of each field is explained in the comment, and we can learn about the MINIX file system from the minix_super_block object.
3. File reading process
Now that you know the structure of the MINIX file system, let’s look at the process of reading files from the MINIX file system.
For example, if we want to read the contents of the /home/file.txt file, how does the MINIX file system accurately find and read the contents of the file? Let’s describe this process in steps.
Step 1: Read the root directory
To read the /home/file.txt file, start at the root directory /. The MINIX file system convention is that the root directory uses the first element of the inode table for storage. The diagram below:
As shown in the figure above, the root directory is stored using the first element of the inode table, and then looks up the directory home from the list of files in the root directory. The inode index of the home directory is 5, indicating that the home directory is stored in the 5th element of the inode table.
Step 2: Read the home directory
Select * from home (inode = 5); select * from home (inode = 5);
As shown above, the inode index of file.txt from the list of files in the home directory is 9, so you can now get the corresponding inode node of file.txt by reading the ninth element of the inode table.
Step 3: Read the contents of file.txt
Now that we know the inode index of file. TXT, we can obtain the inode node of file. TXT by reading element 9 in the inode table. The contents of the file can then be read from the block of data pointed to by the i_zone field of the inode node, as shown below:
As shown in the figure above, after obtaining the inode node of file.txt by reading the ninth element of the inode table, the contents of the file can be read from the data block pointed to by the i_zone field of the inode node.
As an additional note, the inode bitmap and data block bitmap are used to quickly find which inode nodes and data blocks are not being used when creating files.
Four,
This article introduces how to design a file system through MINIX, a simple file system. Although there are many kinds of file systems in Linux system, the basic idea is how to effectively manage the data of the hard disk. So, understanding the design of the MINIX file system is very helpful in understanding other different file systems.