File System Concepts
- File System Import
Because of the large amount of information that computer systems deal with, it is impossible to keep all information in memory. It is usually stored in external memory in the form of files. However, in a multi-user system, it is necessary not only to ensure that the location of each user’s file does not conflict, but also to prevent any user from occupying the external storage space. It is necessary not only to ensure that any user’s files are not stolen or damaged by unauthorized users, but also to allow multiple users to share some files under certain conditions. Therefore, there is a common management authority responsible for the unified use of the external storage space, managing the files in the external storage space, which introduced the file system.
- The role of the file system
The so-called file system is the general name of a set of software, managed files and some data structures required for file management in the operating system.
As a unified information management organization, the file system has the following functions:
-
Manages file storage space (external storage) in a unified manner and allocates and reclaims storage space
-
Determine the location and form of file information storage
-
Implement file mapping from namespace to external storage address space, that is, implement file by name access
-
Effective implementation of various file control operations (such as create, undo, open, close files, etc.) and access operations (such as read, write, modify, copy, dump, etc.)
-
Realize file information sharing, and provide reliable file confidentiality and protection measures.
File structure
File structure refers to the organization of files. It is usually divided into the logical structure of the file and the physical structure of the file.
The logical structure of a file refers to the organizational form of a file, that is, the organizational form of a file seen from the perspective of users, in which users access, retrieve and process relevant information.
The physical structure of a file refers to the internal organization of the file, that is, the storage method of the file on the physical storage device. Because the physical structure of a file determines where the file information is stored on the storage device, the conversion of the logical block number to the physical block number of the file information is also determined by the physical structure of the file.
In addition, how files are accessed is also related to their physical structure. Different file systems generally correspond to different physical structures.
Linux file system hierarchy
What is a virtual file system
To achieve operating system support for different file systems, the operation and management of different file systems should be brought into a unified framework. The implementation details of different File systems are hidden from user programs to provide a unified, abstract, Virtual File System interface for user programs, which is called Virtual File System Switch (VFS). For example, in Linux, a DOS format disk or partition, called a file system, can be “installed” into the system, and then the user program can access those files in exactly the same way as if they were Ext2 files, too.
In general, virtual file systems are divided into three layers, as shown in the following figure:
The first layer is the file system interface layer, such as open, write, close and other system call interfaces.
The second layer is the Virtual File System (VFS) interface layer. This layer has two interfaces: one to the user; One is the interface to a particular file system. The INTERFACE between the VFS and the user directs all operations on files to the corresponding specific file system functions. The INTERFACE between a VFS and a specific file system is implemented primarily through VFS-Operations.
The third layer is the specific file system layer, which provides the structure and implementation of specific file systems, including network file systems such as NFS.
VFS data structure
-
Superblock object. Represents a file system that holds information about the file system to be installed. In the case of a disk-based file system, this object corresponds to a file system control block stored on the disk system, with a superblock object for each file system.
-
Inode object. Represents a file. Store common file information. For disk-based file systems, this object typically corresponds to a file control block (FCB) that exists on disk, with an inode object for each file.
-
Dentry object. Represents a component in a path. Stores information about directory entries linked to corresponding files. The VFS puts recently used dentry objects in the directory entry tell cache to speed up the file pathname search process to improve system performance.
-
File object. Represents a file that has been opened by the process. Holds information about the interaction between an open file and a process, which is stored in main memory only for the duration of the process accessing the file. The file object is created when the system call open() is executed and destroyed when the system call close() is executed.
Each primary object contains an action object that describes the methods the kernel can use on the primary object.
-
Super_operation object. These include methods that the kernel can call for a particular file system.
-
Inode_operation object. This includes methods that the kernel can call for a particular file.
-
Dentry_operation object. This includes methods that can be called for a particular directory.
-
File_operation object. This includes methods that the process can call on an open file.
Superblock object
-
A superblock object describes information about a file system. Each file system has its own superblock, such as the Ext2 superblock, which is stored in a specific sector of the disk.
-
When the kernel initializes and registers a specific file system, it calls the functions provided by the file system to allocate a VFS superblock to it and populates it with information from the specific file system superblock.
-
VFS superblocks are created when specific file systems are installed and automatically deleted when the specific file systems are uninstalled. Therefore, VFS superblocks are stored only in main memory.
struct super_block {
struct list_head s_list; // link all superblocks bidirectionally
dev_t s_dev; // Identifier of the device where the file system resides
unsigned long s_blocksize; // The size of the disk block in bytes
unsigned char s_blocksize_bits; // The disk block size in the power of 2, such as 4KB, is 12
unsigned char s_dirt; // Change the dirty flag
loff_t s_maxbytes; // File size upper limit
struct file_system_type *s_type; // pointer to the registry file_system_type structure
const struct super_operations *s_op; // A pointer to the superblock manipulation function set
unsigned long s_flags; / / tag
unsigned long s_magic; / / the magic number
struct dentry *s_root; // Directory entry object of the installation directory
struct rw_semaphore s_umount; // Unload semaphore
struct mutex s_lock; // Superblock semaphore
int s_count; // Superblock reference count
struct list_head s_inodes; // The number of I nodes
struct list_head s_files;
struct list_head s_dentry_lru; /* unused dentry lru */
int s_nr_dentry_unused; /* # of dentry on lru */
…
struct list_head s_instances; // This type of file system
char s_id[32]; /* Text name */
void *s_fs_info; // Each specific file system private data structure
Copy the code
struct super_operations {
struct inode* (*alloc_inode) (struct super_block *sb); // Allocate an inode
void (*destroy_inode)(struct inode *); // Destroy an inode
void (*dirty_inode) (struct inode *); // Mark the inode as dirty
int (*write_inode) (struct inode *, int); // Write the given inode back to disk
void (*drop_inode) (struct inode *); // Release the inode logically
void (*delete_inode) (struct inode *); // The inode is physically released
void (*put_super) (struct super_block *); // Release the superblock object
void (*write_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
int (*freeze_fs) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
void (*clear_inode) (struct inode *); // Clear an inode
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct vfsmount *);
int (*show_stats)(struct seq_file *, struct vfsmount *);
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
};
Copy the code
Index node object
-
The inode object contains all the information the kernel needs to operate on a file or directory. The file name can be changed, but the inode is unique to the file and exists with the file.
-
An inode represents a file in the file system. It can be a special file such as a device or pipe, so the inode contains special items.
-
When a file is accessed for the first time, the kernel assembles the corresponding inode objects in memory to provide the kernel with all the information necessary to operate on a file. Some of this information is stored in specific locations on disk and some is dynamically populated at load time.
struct inode {
struct hlist_node i_hash;
struct list_head i_list; /* backing dev IO list */
struct list_head i_sb_list;
struct list_head i_dentry;
unsigned long i_ino; // Index node number
atomic_t i_count; // Reference count
umode_t i_mode; // File type and access permission...const struct inode_operations *i_op; // Index operation pointer
const struct file_operations *i_fop; // File manipulation pointer
struct super_block *i_sb; // Pointer to the superblock
struct file_lock *i_flock; // A pointer to the file chain table
struct address_space *i_mapping; // The address_space pointer used in shared memory
struct address_space i_data; // Address_space object for the device
struct list_head i_devices; // Device linked list
union {
struct pipe_inode_info *i_pipe; // Pipe device I node information
struct block_device *i_bdev; // points to the block device driver
struct cdev *i_cdev; // Points to the character device driver}; . }Copy the code
struct inode_operations {
int (*create) (struct inode *,struct dentry *,int, struct nameidata *); Create a new inode
struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); // Find the directory where an inode resides
int (*link) (struct dentry *,struct inode *,struct dentry *); // Create a hard link
int (*unlink) (struct inode *,struct dentry *); // Remove a hard link
int (*symlink) (struct inode *,struct dentry *,const char *); // Create an inode for symbolic links
int (*mkdir) (struct inode *,struct dentry *,int); // Create an inode for the directory entry
int (*rmdir) (struct inode *,struct dentry *); // Delete an inode for the directory entry
int (*mknod) (struct inode *,struct dentry *,int.dev_t); // Create an index node for a directory entry and a special file. }Copy the code
Directory item object
-
VFS treats each directory as a file. For example, in the/TMP /test path, TMP and test are files. TMP is a directory file, and test is a normal file.
-
In addition to an inode data structure, each file has a dentry data structure associated with it, in which d_inode Pointers point to the corresponding inode structure.
-
The Dentry data structure can speed up the quick location of files and improve the efficiency of file systems.
-
Dentry describes the logical properties of a file. It has no corresponding image on disk. The inode structure records the physical properties of a file and has corresponding images on disk.
struct dentry {
atomic_t d_count; // Directory reference count
unsigned int d_flags; // Directory entry status flag
spinlock_t d_lock; // Directory spinlocks
int d_mounted; // Whether it is the installation point directory entry
struct inode *d_inode; // The Indoe node where the directory entry resides
struct hlist_node d_hash; // Hash table formed by directory entries
struct dentry *d_parent; // The directory entry object of the parent directory
struct qstr d_name; // The name of the directory entry for quick lookup
struct list_head d_lru; // An unused LRU bidirectional list
union {
struct list_head d_child; // Subdirectories of the parent directory form a bidirectional linked list
struct rcu_head d_rcu;
} d_u;
struct list_head d_subdirs; // A bidirectional list of subdirectories of the directory entry
struct list_head d_alias; // List of inode aliases
unsigned long d_time; // Restart time
const struct dentry_operations *d_op; // A set of functions that operate on directory entries
struct super_block *d_sb; // Point to file super fast
void *d_fsdata; // File system special data
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; // The first 15 characters of the file name
};
Copy the code
struct dentry_operations {
int (*d_revalidate)(struct dentry *, struct nameidata *); // Check whether the directory is valid
int (*d_hash) (struct dentry *, struct qstr *); // Generate hash values
int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); // Compare two file names
int (*d_delete)(struct dentry *); // Delete the object with d_couny 0
void (*d_release)(struct dentry *); // Release a directory entry object
void (*d_iput)(struct dentry *, struct inode *); // Discard the Indoe of the directory entry
};
Copy the code
The file object
-
File objects have no image on disk and consist of a file structure created when the file is opened.
-
The information in a file object is primarily a pointer to a file, the current location in the file where the next operation will take place.
-
The file structure holds a pointer to the file inode in addition to the file’s current location, and forms a double-necklace table, which is called the system open file table
struct file {
union {
struct list_head fu_list; // List of file objects
struct rcu_head fu_rcuhead;
} f_u;
struct path f_path; //
const struct file_operations *f_op; // File manipulation function set
spinlock_t f _lock; /* f_ep_links, f_flags, no IRQ */
atomic_long_t f_count; // File object reference count
unsigned int f_flags;
fmode_t f_mode; // Operate file mode
loff_t f_pos; // The current file location
struct fown_struct f_owner;
const struct cred *f_cred;
struct file_ra_state f_ra;
u64 f_version;
/* needed for tty driver, and maybe others */
void *private_data;
struct address_space *f_mapping;
}
Copy the code
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t.int); // Modify the file pointer
ssize_t (*read) (struct file *, char __user *, size_t.loff_t *); // Write several bytes from the offset of the file
ssize_t (*write) (struct file *, const char __user *, size_t.loff_t *); // Writes several bytes to the file at the specified offset
ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long.loff_t);// Read several bytes from the offset of the file asynchronously
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long.loff_t);// Write bytes asynchronously to the file at the specified offset
int (*ioctl) (struct inode *, struct file *, unsigned int.unsigned long);// Send commands to hard devices
int (*flush) (struct file *, fl_owner_t id); // Refresh the file when closing the file
I int (*release) (struct inode *, struct file *); // Release the file object
int (*fsync) (struct file *, struct dentry *, int datasync); // Write the file's cached data back to disk
Copy the code
File related
File system types are distinguished by the physical media on which the file system resides and the way data is organized on the physical media. The file_system_type structure is used to describe the type of a file system. Every file system supported by Linux has one and only one file_system_type structure, regardless of whether there are zero or more instances of it installed on the system.
struct file_system_type {
const char *name; /* The name of the file system */
struct subsystem subsys; /* SysFS subsystem object */
int fs_flags; /* File system type flag */
/* When the file system is installed, superblocks are read from disk and superblock objects are assembled in memory
struct super_block* (*get_sb) (struct file_system_type*,
int.const char*, void *);
void (*kill_sb) (struct super_block *); /* Terminate access to the superblock */
struct module *owner; /* File system module */
struct file_system_type * next; /* The next file system type in the linked list */
struct list_head fs_supers; /* Lists of superblock objects with the same filesystem type */
};
Copy the code
Every time a file system is actually installed, a vfsmount structure is created, which corresponds to an install point.
struct vfsmount
{
struct list_head mnt_hash; /* Hash table */
struct vfsmount *mnt_parent; /* Parent file system */
struct dentry *mnt_mountpoint; /* Installation point directory entry object */
struct dentry *mnt_root; /* The root directory object of the file system */
struct super_block *mnt_sb; /* The file system's superblock */
struct list_head mnt_mounts; /* List of child file systems */
struct list_head mnt_child; /* List of child file systems */
atomic_t mnt_count; /* Use count */
int mnt_flags; /* Install flag */
char *mnt_devname; /* Device file name */
struct list_head mnt_list; /* Descriptor list */
struct list_head mnt_fslink; /* Expiration list of specific file systems */
struct namespace *mnt_namespace; /* Associated namespace */
};
Copy the code
Process-related
-
The user opened files_struct
-
Fs_struct specifies the current working directory and root directory of a process
struct task_struct {.struct fs_struct *fs; /* File system information */
struct files_struct *files; /* Information about the currently open file */. };Copy the code
The file descriptor fd is used to describe open files, and each process records the use of the file descriptor using a files_struct structure called the User open file table. Pointers to this structure are stored in files, a member of the task_struct structure of the process.
struct files_struct {
atomic_t count; // Number of processes sharing the table
struct fdtable *fdt; // File descriptor pointer to fdtab
struct fdtable fdtab; // File descriptor table
spinlock_t file_lock ____cacheline_aligned_in_smp; // Protect the structure lock
int next_fd; // Next idle fd
struct embedded_fd_set close_on_exec_init; // The initial set of file descriptors to close when executing exec
struct embedded_fd_set open_fds_init; // Initial set of file descriptors
struct file * fd_array[NR_OPEN_DEFAULT]; // An array of initialization Pointers to the file object
};
Copy the code
Fd is a pointer to an array of Pointers to file objects, the length of which is stored in the max_fdset field. In general, the fd field points to the FD_array field of files_struct structure, which contains NR_OPEN_DEFAULT (the default is 32) Pointers to file objects. If the number of files opened by the process is greater than NR_OPEN_DEFAULT, the n kernel allocates a new array of file Pointers and stores their addresses in the FD field, updating the value of the max_fDS field.
struct fdtable {
unsigned int max_fds; // The maximum number of files that the process currently has
struct file ** fd; // Array of Pointers to file objects (system open file entries)
fd_set *close_on_exec; // points to the file descriptor to be closed when exec() is executed
fd_set *open_fds; // A pointer to the descriptor of the open file
struct fdtable *next; // point to the next description table
};
Copy the code
For each file that has an entry address in the FD array, the subscript of the array is the file descriptor. Normally, the first element of the array (index 0), the second element (index 1), and the third element (index 2) represent the standard input file, standard output file, and standard error file, respectively, and these three files are usually inherited from the parent process. With appropriate system calls, two file descriptors can point to the same open file, i.e. two elements of an array can point to the same file object.
For file lookups, each process has a current working directory and the root of the file system where the current working directory resides.
struct fs_struct { // Establish the relationship between the process and the file system
atomic_t count; /* Structure usage count */
rwlock_t lock; /* The lock that protects the structure */
intUmask./* Default file access permission */
struct dentry * root; /* The root directory entry object */
struct dentry * pwd; /* The directory entry object of the current working directory */
struct dentry * altroot; /* Optional root directory entry object */struct vfsmount * rootmnt; /* Root installation point object */
struct vfsmount * pwdmnt; /* The installation point object of the current working directory */
struct vfsmount * altrootmnt; /* Replaceable root installation point object */
};
Copy the code
Logical structure of Linux file system
Disks and file systems
The path to find
Basic instructions
The idea of opening the file operation in the kernel is very simple: that is, through the path of user mode to find the file item by item; If the file exists, the kernel creates a file structure for the file; At the same time, the file structure is associated with the files array, and finally the index of the array is returned as the user file descriptor.
Path lookup is a hierarchical resolution of a given file path in terms of directory entries. It mainly includes the following contents:
-
Determines the starting location for the path lookup. For example, the start position could be current->fs-> CWD or current->fs->root;
-
Whether the current process has the permission to access the inode associated with the directory entry.
-
Based on the current directory entry, search for the next-level directory entry. The lookup may be down to the child file or up to the parent directory (for example, the next-level directory entry is “.. “). );
-
Deal with mount points; If the current directory entry is a mount point, it must handle the jump between different file systems.
-
Handling symbolic link files: If the current directory entry is a symbolic link file, then the real file to which the file points must be followed;
-
Find and create missing parts of the file path. For example, when you create a new file through open(), there may be some directory entries in the path that do not currently exist;
Among them, the first is the primary work of the path search; Items 2 to 6 are checked and confirmed for each directory item during path searching.
Do_sys_open () is responsible for the basic implementation of the open system call, and the internal do_filp_open function is responsible for most of the implementation of open, including path finding.
Sequence diagram of sys_open() function
-
First, get_unused_fd_flags() gets an available file descriptor; Through this function, we know that the file descriptor is the index value corresponding to a file object in the open file list of the process.
-
Next, do_filp_open() opens the file and returns a file object representing a file opened by the process; A process reads and writes to physical files using such a data structure.
-
Finally, fd_install() establishes a connection between the file descriptor and the file object, and the process then manipulates the file descriptor to read or write the file.
path_init()
This parameter is used to set the start position of the path search, mainly by setting the ND variable
If flags sets the LOOKUP_ROOT flag, the function is called by the open_BY_HANDLE_AT function, which specifies a path as the root; This is a special case and will not be analyzed here; Next, path_init sets nd in three main ways.
-
If the path name starts with /, the current path is an absolute path. Run set_root to set nd. Otherwise, path name is a relative path.
-
If DFD is AT_FDCWD, then the relative path starts with the current path PWD, so set nd by PWD.
-
If DFD is not AT_FDCWD, it indicates that the relative path is set by the user. You need to obtain the specific relative path information through DFD and then set nd.
Steps 2 and 3 above both indicate that the file path to be opened starts with a relative path, but they are slightly different. Step 2 is our usual default open operation, while Step 3 specifically refers to the OpenAT system call, which is reflected in the values passed to the DFD parameter in do_sys_open by the different open system calls.
In either case, the ND variable is set, which is of nameidata type. In path_init, nd’s last_type is set to LAST_ROOT by default.
In path_init, if the previous step is 1, the root field of nd is updated by the fs->root field of the current process, and the PATH field of ND also points to the root field. If the value is Step 2, the current process fs-> PWD updates the path field of nd. In the case of Step 3, obtain the file structure of the specified working directory using the file descriptor DFD, and update the PATH field of nd using the F_PATH field of file. Note that the root field is not set in step 2 and Step 3. Finally, the inode field in nd is updated with path.dentry->d_inode.
link_path_walk
Link_path_walk () is used to traverse each directory entry step by step. The function declaration is as follows:
static __always_inline int link_path_walk(const char *name, struct nameidata *nd)
The core of the function is done through a loop. Before entering this loop, if the path name is an absolute path, the function also does something to the path by filtering out the extra/before the absolute path /.
In the loop, the work to be done includes the following:
-
Next is a variable of type path pointing to the next directory entry; Name points to the searched path; This is a variable of type QSTR, which represents the hash value of the directory item in the current search path. Type indicates the current directory item type.
-
If necessary, update the hash value for the current directory entry and save it in this;
-
If the current directory entry is. , type is LAST_DOT; If the directory entry is “..” , type is LAST_DOTDOT. Otherwise, type defaults to LAST_NORM;
-
If the current directory entry has more than one contiguous/delimiter (such as /home/// /edsionte), it is filtered, even if name refers to the last /;
-
Walk_component () handles current directory entries, updating nd and next; If the current directory entry is a symbolic link file, only next is updated;
-
If the current directory entry is a symbolic link file, it is processed by nested_symlink() to update nd.
-
If the directory entry in name is traversed, the operation ends. Otherwise proceed to the next cycle;
Through the loop above, the user-specified path name is searched from beginning to end. At this point, ND saves the information about the last directory entry, but the kernel does not determine whether the last directory entry actually exists, which will be done in do_last().
conclusion
The VFS, or virtual file system, is an abstract software layer within the Linux file system. Because of its support, various actual file systems can coexist in Linux and cross-file system operations can be implemented. VFS uses its four main data structures — superblocks, index nodes, directory entries, and file objects — and some auxiliary data structures to provide the same interface to open, read, and close files, directories, devices, sockets, and so on in Linux. Only when control is transferred to the actual file system can the actual file system differentiate and perform different operations for different file types. VFS, therefore, enables cross-filesystem operations and enables the Unix/Linux “Everything is a file” slogan. ` `