The file system

File system is an important part of the operating system, is the organization of the file management, this article on the main disk on how the file storage organization and file operation of two parts, nonsense not to say directly.

File storage

There is a familiar Linux saying that everything is a file, and this is also the philosophy of UNIX. Ordinary files, directories, block/character devices, network device sockets, etc., are all files to Linux, and although they are of different types, the Linux system provides them with a uniform set of interfaces for unified treatment and operation.

This article will not talk about many kinds of files, just to talk about the files stored on disk, the key problem for files to be stored on disk is to record which disk blocks are used for each file, generally there are some methods.

Continuous distribution

Contiguous allocation, as the name implies, places files in contiguous physical space, as shown in the figure below:

Continuous allocation means that files are stored in consecutive data blocks. Two files cannot share one data block. For example, if a file actually uses 3.5 data blocks, it actually uses 4 data blocks. As mentioned earlier, the smallest unit of file system operations is a block, which cannot be subdivided.

Continuous distribution has the following advantages:

  • Implementation is relatively simple, and the contiguously allocated representation only needs to record the address and length of the starting block.
  • The read/write efficiency is high because the data blocks are closely contiguously connected, and the entire file can be read out in one disk seek.

But the drawbacks are obvious:

  • If the new file is smaller than 3 data blocks, it can be put down. If the new file is larger than 3 data blocks, it can not be put down. Because to allocate space continuously, you must find a suitable contiguous space when allocating space. If it does not exist, it does not matter that the sum of all the fragments is greater than the space required by the file.
  • File length is not easy to expand, as shown in the figure above, the files are close to each other. To expand a file, it is ok if there is enough free space behind it. If there is not, it is necessary to find another free space and move the whole file, which is very inefficient and troublesome.

Distribution of the list

Each data block occupied by the linked list structure file does not need to be placed continuously. Each data block has a certain space as a pointer to the next one. Just like the linked list, data block is equivalent to node, and the diagram is as follows:

The advantages and disadvantages of this form of organization are also obvious, that is, the advantages and disadvantages of the chain structure of the data structure, the use of linked list file data blocks do not have to be stored sequentially, sequential read more convenient but random access is more troublesome. When a data block is corrupted or lost, the entire file may be corrupted or lost.

Also, the block size is usually a power of two, but the size of each block is no longer a power of two when organizing the file this way. Many programs require powers of two to read and write data, which can cause additional overhead.

To solve the above problems, a solution is proposed, which is to organize all block Pointers together to form a Table and put it in memory. This Table is called FAT (File Allocation Table).

As shown in the figure above, file A uses blocks 4, 0, 6, and 2 in sequential order, and file B uses blocks 5, 1, 3, 10, and 8.

Since each disk block has a record, the file allocation table is proportional to the disk size. When the disk is too large, the file allocation table will take up a lot of memory, so this scheme is not suitable for large disks.

Index structure

The FAT structure keeps all the disk block Pointers together in memory so that the disk blocks used by the file can be connected back and forth, while the index structure maintains the disk block addresses used by each file separately. This means creating an index data structure for each file, which contains the addresses of each disk block used by the file. This is direct indexing, and there are one or more levels of indexing, which means that the address is not the data block, but the index block, which in turn holds the address of the data block or the index block.

Let’s take a look at the inode, the index structure used by the Linux operating system. Inodes contain a lot of information, but let’s just start with the addressing part and look at the general structure:

For example, the inode shown in the figure above has 12 direct indexes, 1 primary index, 1 secondary index, and 1 tertiary index, and assumes that each block is 1 KB. If an address entry requires 4 bytes to represent, what is the maximum size of the file that the inode can represent?

12×1=12KB12 \times 1=12KB12 ×1=12KB

Each index block can store 1KB÷4B=2561KB\div4B=2561KB÷4B=256 addresses

Level indexes: 256 x 1 KB = 256 KB = 4 mb256 \ mb256 times1KB = 256 KB = 4 x 1 KB = 256 KB = 4 MB

Secondary index: 256×256×1KB=216KB=64M256\times256\times1KB=2^{16}KB=64M256×256×1KB=216KB=64M

3 index: 256×256×256×1KB=224KB=16G256\times256\times256\times1KB=2^{24}KB=16G256×256×256×1KB=224KB=16G

The maximum file size supported is about 16GB when they are added together.

Free Space Management

The above methods are used to manage the disk blocks used by files. The free disk blocks also need to be managed. In this way, it is convenient to operate when the disk blocks need to be allocated for a new file

Free area table method

A contiguous unallocated area on a disk is called a free area. The free area table belongs to the contiguous allocation mode. To do this, create a table to record the block number of the first free block and the number of contiguous free blocks.

Free linked list method

Each free block contains a pointer to the next free block, and the free blocks are strung together in this way. The first free block is stored in a special location on disk, and is also cached in memory.

When a file is created, several blocks are removed from the chain header and allocated to the file. When the file is deleted, the disk blocks occupied by the file are connected to the chain header.

Free block group list method

Neither method is suitable for large file systems, resulting in large free tables or long chained structures. The free grouping method, as the name suggests, first divides the free blocks into groups, and then strings the groups together using the linked list method.

The group of the block (generally the first) as a “pointer” to the next group, of course, the block is impossible to only store a pointer, otherwise it would be too wasteful, in fact, the block inside the next group to store the number of free blocks, block number, each free block number and other information.

Bitmap method

You might be familiar with the idea of using a bit to represent the usage of a disk block, with 1 for use and 0 for free. For a 1GB disk, the block size of 1KB1KB is 2202^{20}220 disk blocks. In this case, 220bit/8=128KB2^{20}bit /8= 128KB220bit/8=128KB.

Using bitmap method will increase the calculation burden of CPU.

EXT2

Take a look at a specific file system, EXT2, to get a sense of what’s going on, and see how it’s laid out on disk.

The overall layout

Let’s look at the overall structure layout directly:

Noun explanation

MBR/GPT, OS boot block: needless to say, see above for more details

Superblock: describes the file system information of the entire partition, inode/ data block size, total amount, usage, and remaining amount, file system mounting time, last data write time, and last disk check time.

Block group descriptor table: The block group descriptor table stores block group descriptors as many as there are block groups. Block group descriptors store the inode array, the starting position of the data area, the inode, and the total number of remaining blocks. There is a copy of the block group descriptor table in each block group

Block bitmap: indicates the block usage. Each bit indicates whether a disk block is available. Only one disk block is used.

Inode bitmap: indicates the usage of inodes. Each bit indicates whether an inode is free and occupies only one inode.

Inode array: the place where all the inodes are stored. The inode contains almost all the information about a file except the file name. The inode mainly includes the file size, owner group ID, read/write execution permission, timestamp and other property information, and also the data block pointer.

Data block: stores specific file content data

directory

The essence

Directories are also files, and unlike ordinary files, their data is file information, that is, directories are files that contain file information.

A directory is a table containing directory entries. It has three properties: file name, inode number, and file type. The inode contains most of the information about a file, but does not contain the name of the file. This property is indicated in the directory entry. The main purpose of a directory entry is to associate the inode with the file name.

Each directory file will contain at least two directory entries: the current directory… And the parent directory… , as shown in the following figure, the directory entries contained in the empty folder test:

Note that the parent directory of the root directory is its own, that is, the root directory of two directory entries. And.. It’s the same thing.

The path

So what’s the path? We often see a path representation like /OS/test in the figure above, but if you take a closer look, you’ll see that it’s just filenames separated by ‘/’.

There are two kinds of paths. One is the path that starts with ‘/’ on the left, which is called an absolute path. For example, /a/b/ c.tb.

The other is a relative path, which is relative to the current location and does not start with a ‘/’, such as a/ B.tb. In the current directory, look for a directory, and in a directory, look for b.tb files. C. t_t means to find the c. t_t file in the current directory. Generally, b.t_t and./ b.t_t mean the same thing. . / d.t_t indicates to search for d.t_t files in the parent directory of the current directory.

To find the

We’ve been talking about searching in the abstract, so let’s see how we can find the corresponding file according to the path. As mentioned above, if you look at the string of characters of the path, you will find that the path is actually a file name separated by the delimiter ‘/’. With the file name, it is easy to do. The directory entry stores the corresponding relationship between the file name and the inode number, so searching for a file is to find the corresponding inode number in the directory entry according to the file name. It can be divided into the following four steps:

  1. Look for the directory entry corresponding to the file name in the root directory or current directory
  2. Get the inode number from the directory entry and find the corresponding inode in the inode array
  3. Gets the block address of the file/directory from the inode
  4. Repeat the steps above to find the final required file


File operations

The inode stores information about the file itself, and we often have to work on files all the time, so we need to have a structure to record that information.

File structure and file table

This data structure is called struct file. When a file is opened, a file structure is generated. All of these file structures are centrally managed by the kernel to form a file table, and the file structure is the entry of the file table. The file table is global and not private to a process.

The file structure mainly contains information such as path, inode pointer, number of times opened, file offset location, and so on. I don’t want to explain it in detail.

File descriptors and open the file descriptor table

But shouldn’t the files that each process operates on be isolated? Struct file *fd_array[NR_OPEN_DEFAULT]) (struct file *fd_array[NR_OPEN_DEFAULT]) (struct file *fd_array[NR_OPEN_DEFAULT]]);

The index of this array is the file descriptor, which is the index of an array, so file descriptors are nonnegative integers.

Dup and dup2

These are two functions for file descriptors. The prototype of dUP is as follows:

int dup(int oldfd);
Copy the code

The function dup copies the file descriptor oldfd and returns a new descriptor, newfd, which has two characteristics:

  • Newfd is the smallest file descriptor currently available
  • The olDFd and newfd descriptors point to the same file entry

Dup2 is similar to DUP in that its function prototype is as follows:

int dup2(int oldfd, int newfd);
Copy the code

This function is similar in functionality to dUP, except that instead of returning the smallest available file descriptor, you can decide which file descriptor to return.

  • Return newfd if newfd is idle or equal to oldfd.
  • When newfd is not idle pointing to a file, close the file and return newfd.

File pointer and file stream pointer

The file pointer is the element that opens the file descriptor table, namely fd_array[I], and points to the file entry.

The c header FILE stdio.h defines a FILE structure with the same name as the FILE structure. Don’t confuse this with the FILE structure.

struct _iobuf {
        char *_ptr;      // The next location for file input
        int   _cnt;      // The relative position of the current buffer
        char *_base;     // refers to the base location (i.e., the initial location of the file)
        int   _flag;     // File flag
        int   _file;     // Validates the file, which is actually the file descriptor
        int   _charbuf;  // Check the state of the buffer, if there is no buffer, no read
        int   _bufsiz;   // Buffer size
        char *_tmpfname; // Temporary file name
        };
typedef struct _iobuf FILE;
Copy the code

The int _file up here is actually the FILE descriptor, so the C FILE structure is actually the wrapper around the FILE descriptor.

With that in mind, let’s take a look at some of the important things about file descriptors and file entries:

  • Open the file descriptor table in the process’s PCB, which is the task_struct.
  • Each time a file is opened, a file structure (file entry) is created, even if the same file is opened multiple times, whether it is opened multiple times by the same process or by different processes.
  • A file corresponds to one inode. A file can be opened multiple times, which corresponds to multiple file entries. Therefore, an inode corresponds to multiple file entries.
  • Different file descriptors for the same process can point to the same file entry using the DUP function
  • Fork The open file descriptor table of the parent process is copied when the child process is created, so the parent process shares the file entries.

Finally, let’s take a look at the schematic and start from the beginning:

Specific operation

For files, we maintain three data structures in memory: file descriptors, file structures, and in-memory inodes. Together, these three structures give us the tables described above. In-memory inodes are basically the same as those on disk.

Of course, some of the specific data in the file is cached in memory, so manipulating the cache of the file in memory and the structure maintained for it, and then synchronizing it to disk is what the system call about the file actually does.

Synchronization to disk can be thought of as a write to disk, with specific instructions in and out for read and write operations on the disk. You can use inline assembly and other methods to wrap these two instructions into functions for easy invocation.

Let’s look at the general principles of some specific file operations.

File creation: Each file corresponds to an inode. Creating a file requires the use of an inode, so request free inodes in the inode bitmap and inode array. Of course, you create the file itself, so you also need to apply for blocks in the block bitmap. Files must also belong to a directory, so add an entry to that directory.

Open file: The file is usually opened according to the given path and identification, so the path is first parsed, the inode of the file is obtained, the inode cache is loaded in memory, and then the file entry is created, and the file descriptor is created.

Close a file: reclaim the corresponding file descriptor. The number of open files in the file entry is reduced by one. If the number is reduced to 0, delete the file entry and reclaim the inode in the cache.

Reading and writing files: Basically reading and writing to disk, using the in and out instructions described above, but disk I/O is slow, so both require a cache relay to reduce I/O times. And of course there are other technologies like DMA, which I won’t expand here.

Lseek: sets the file offset property in the file entry.

Delete file: Basically the reverse of creating a file.

Create directory: allocate inode, allocate blocks, and add two directory entries to the new directory. And.. Is added to the parent directory of the new directory.

Deleting a directory: Basically the reverse of creating a directory

The rest of the operation of directories is similar to that of ordinary files, or all the operations of files on disk. Remember that the CPU does not exchange data directly with the disk. It is memory that deals with the CPU directly, so any operation on a file on disk has to be read into memory and then synchronized to disk.

Virtual file system

An operating system may support multiple file systems. Each file system may implement different file operations and management methods. To unify the file system, add a proxy virtual file system on top of the file system to display a unified interface.

Process, virtual file system, file system, and external storage are roughly related as follows:

Virtual files provide a unified interface for processes to call. This interface is usually a POSIX standard system call, such as open or write. Each file system provides actual functions for actual operations.

Unlike Windows, which allocates a letter to each partition as a drive letter, each partition is directly there. In Order for Linux to use a partition, it must be mounted to a directory.

Mounting is a method of accessing a file system, which is actually a process of associating a file system with a directory in the current directory tree. For example, if I want to mount the file system on a new partition under /a, the /a directory will display the information on the new partition, and I can then operate on the files on it.

This directory should be empty in principle, otherwise the contents in this directory will be temporarily hidden until uninstalled.

About the file system this article first said this, what mistake also please criticize and correct, also welcome everyone to exchange with me to discuss learning progress.