preface
File management is one of the functions of the operating system. Because the memory of the system is limited and cannot be stored for a long time, data is usually stored in the external memory in the form of files, and then transferred to the memory when necessary.
Files and file systems
Definition of file
The management function of a file system is achieved by organizing the programs and data it manages into a series of files. A file is a collection of related elements with a filename. An element is usually a record, and a record is a collection of meaningful data items. Data components can be divided into data items, records, and files.
- Data items. Data items are the lowest form of data organization. It is divided into basic data items (used to describe a certain attribute of an object character set, is the smallest logical data unit in the data organization can be clearly, namely atomic data, also known as data elements or fields) and combined data items (composed of several basic data items).
- A record is a set of related data items used to describe the attributes of an object in a certain aspect. In order to uniquely identify a record, it is necessary to determine a data item or set in the record and call their set a keyword. A keyword is a data item that uniquely identifies a record.
- A file is a set of related elements with a file name. A file is the largest data unit of a file system. Logically, it can be divided into: structured file: it is composed of a group of related records, also known as record file. Unstructured file: can be regarded as a character stream, such as a binary file or character file, also known as a stream file.
File properties
Files have their own properties, which vary from system to system, but usually have the following properties:
-
Name: The file name is unique and stored in a form that is easy to read;
-
Identifier: A unique label, usually a number, that identifies a file in a file system. It is an internal name that is not readable by the user.
-
Type: used by file systems that support different types.
-
Location: Pointers to devices and files on devices;
-
Size: The current size of the file (expressed in bytes, words, or blocks), which can also include the maximum allowed size of the file.
-
Protection: Access control information that protects files;
-
Time, date, and user ID: Information about file creation, last modification, and last access, used to protect and track file usage.
In summary, all file information is stored in the directory structure, and the directory structure is also stored in external memory. Files and their related information is called into memory as needed. In general, directory entries contain file names and their unique identifiers, which locate information about other attributes.
File system model
The so-called file system, refers to the operating system and file management related to that part of the software and by them to manage the files and file attributes of the collection. The file system model can be divided into the following layers:
- Objects and their properties: The objects managed by the file system include files, directories, and file system interface memory.
- Software collection for object manipulation and management: this is the core part of a file system. It implements file storage space management, file directory management, file logical address to physical address conversion, file read and write management, file sharing and protection, and other functions.
- Interfaces of a file system: A file system provides command interfaces, program interfaces, and graphical user interfaces for users to use the file system.
File manipulation
-
Create a file. When creating a new file, the system allocates necessary external storage space for the new file and creates a directory entry for the new file in the directory of the file system. The directory entry should record the file name and the external storage address of the new file.
-
Delete a file. When a file is no longer needed, you can delete it from the file system. When deleting a file, the system makes the directory entry of the file to be deleted empty, and reclaims the storage space occupied by the file.
-
When reading a file, specify the file name and the target memory address to be read in the corresponding system call. In this case, the system looks for the directory, finds the specified directory entry, and obtains the location of the read file in external memory. There is also a pointer in the directory entry for reading/writing files.
-
When writing a file, the file name and its source address in memory must be given in the corresponding system call. In this case, the system looks up the directory, finds the specified directory entry, and writes from the write pointer in the reuse directory.
-
Set the read/write pointer of a file, which is used to set the position of the read/write pointer of a file. In this way, each time a file is read or written, the operation does not need to start from the specified position. Sequential access can be changed to random access.
-
Open a file. The main function of opening a file is to copy the property information of a specified file to the memory and return a pointer to the file property information in the memory. In the future, when users need to operate the file, they can directly find the external storage address and other attributes of the file in the storage, thus significantly improving the operation speed of the file.
-
Close a file. You can close an open file when the user no longer requires access to it. Deletes property information for the specified file from memory and writes it back to external memory if it has been modified. If you want to access the file again after closing it, you must open it again.
Logical structure of the file
For any file, there are two forms of structure:
- The logical structure of a file, which is the file organization form observed from the user’s point of view, is the data and structure that users can directly process, independent of the physical characteristics of the file, also known as file organization.
- The physical structure of a file, also known as the storage structure of a file, refers to the storage organization form of a file in external storage. It is not only related to the storage medium, but also related to the distribution of external storage.
The logical structure of a file is the organization of the file as observed from the user’s perspective. From the perspective of logical structure, files can be divided into structured files and unstructured files.
No structure file
Unstructured files are the simplest form of file organization, and their length is measured in bytes. Since there is no structure, records can only be accessed by exhaustive search, so it is not suitable for most applications, but the unstructured file of character stream is easy to manage, and users can easily operate on it. Therefore, files with few operations on basic information units are better suited to the unstructured character stream approach, such as source program files, executable files, library functions, etc.
Structured file
Structured file is a file composed of more than one record, so it is also called record file. Each record is used to describe the entity of an entity, the number of records with the same or different data item, record is divided into fixed length (the length of all the records in the file are the same, all the records of each data item in records in the same position, with the same sequence and length) and variable length record (the length of the file records are not identical, Perhaps because the data items contained in a record are different). According to the organization form of records, it can be divided into:
- Sequential file
Records in a file are ordered (logically) one after the other, and may be of fixed or variable length. Individual records can be physically stored sequentially or chained.
Sequential files have two structures 2. Index file
For fixed-length records, can facilitate the implementation of the sequential access and direct access, with variable length record is hard to realize, however, as a result, to build an index table for variable length record files, for each record in the master file, in the index table with a table used to store the record the length of the m and pointer to record the PTR (pointing to the first address of the logical address space). Because the index table is sorted by record key, the index table itself is a sequential file of fixed-length records.
In the index file retrieval, first according to the key words provided by the user, the use of half-search index table retrieval, find the corresponding entry, and then use the pointer value of the record given by the entry to access the required record. Every time a new record is added to the index file, the index table must be modified. The problem with index tables is that in addition to the main file, an index table needs to be configured. Each record needs to have an index entry, which increases the storage cost.
- Indexed sequential file
Indexed sequential files are a combination of sequential and indexed organizations. Index sequential file Divides all records in the sequential file into several groups. An index table is created for the sequential file. In the index table, an index item is created for the first record in each group, which contains the key value of the record and a pointer to the record.
As shown in the figure above, the main file name contains the name and other data items. The name is the keyword value for the first record (not every record) in each group in the index table, with a pointer to the starting position of that record in the main file. The index table contains only two data items, keyword and pointer, and all the name keywords are arranged in ascending order. The records in the master file are arranged in groups. The keywords in the same group can be out of order, but the keywords between groups must be in order. When looking up a record, find its group through the index table, and then use sequential lookup in that group to quickly find the record.
- Direct and Hash files (also called Hash files)
The direct file directly obtains the physical address of the specified record according to the given record key value, that is, the record key value itself determines the physical address of the record. This conversion from record key value to record physical address is called key-value conversion.
Hash files use Hash functions to convert record keys and values into corresponding record addresses. In order to achieve dynamic allocation of file storage space, the Hash function usually obtains Pointers to the corresponding table object of a directory table, rather than the address of the corresponding record. The object content of the table points to the physical block where the corresponding record resides.
Note: Direct files and hash files are different from sequential files and index files in that they do not have sequential properties.
File directory
File control block
In order to access to the file properly, we must set is used to describe for file and data structure of the control file, called the file control block FCB, file management program can use the information in the file control block, impose various operations on the file, the file with the file control block one-to-one correspondence, and people call an ordered set of file control block file directory, A file control block is a file directory entry. Usually, a file directory can also be considered a file, called a directory file. The FCB contains the following information:
-
Basic information: such as file name, physical location of the file, logical structure of the file (indicating whether the file is a streaming file or a record file, the number of records, whether the file is fixed length or variable length record), physical structure of the file (indicating whether the file is a sequential file, chain file or index file);
-
Access control information: including the access permissions of file owners, approved users and general users;
-
Usage information: such as file creation time, modification time, etc.
The index node
A file directory is usually stored on a disk. When a large number of files are stored, the file directory may occupy a large number of disk blocks. During the search, the file directory in the first disk block is stored in the memory, and the file name specified by the user is compared with the file name in the directory entry. If the specified file is not found, the directory entry in the next disk block is called to memory. In the catalog file, only the file name, only when it finds a directory entry (that is, the file name to match the file name) of the specified to find, only need to read the file from the directory entry physical address, and the description of the file is some other information, in the catalog all need not, apparently, this information does not need to memory in the catalog. For this reason, in some systems, such as UNIX, the file name and file description are separated.
The directory structure
The organization of directory structure is related to the access speed of the file system, as well as the sharing and security of the file. Currently, the commonly used directory structure forms are single-level directory, two-level directory and multi-level directory.
- In the single-level directory structure, only one directory table is created in the entire system. Each file occupies one directory entry, which contains the file name, file extension, file length, file type, file physical address, and status bit (indicating whether the directory entry is free). Every time to create a new file, you must first check all entries, in order to make sure the new file name in the directory is the only, and then found in the table of contents of a directory entry blank, fill in the new file of file names and other information, the collocation state is 1, delete files, to find the file directory entry from the directory and recycle the file storage space occupied, Then clear the directory entry. The advantages of single-level directories are simple and can implement the basic functions of directory management – access by name, but search speed is slow (it takes more time to find a directory entry), duplication of names is not allowed (all files in a directory table cannot have the same name as another file, which is inevitable), File sharing is not convenient (each user has his or her own namespace or naming convention, so different users should be allowed to access the same file with different filenames.
2. A two-level directory structureCreate a separate for each userUser file directory UFD(User File Directory), these File directories have a similar structure and consist of File control blocks for user-owned files. In addition, there is another one in the systemMain file directory MFD(Master File Directory) in the Master File Directory, each user Directory File has a Directory entry, includingA user name and a pointer to a user directory file.
The two-level directory structure overcomes the disadvantages of the single-level directory and has the following advantages:Improved speed of directory retrieval(If there are n subdirectories in the primary directory and each user directory has a maximum of M directory entries, you need to search for a specified directory entry at most N + M).You can use the same file name in different user directories(As long as each file name is unique in the user’s own UFD, different users can have the same file name).Different users can use different file names to access the same shared file in the system. However, when multiple users need to cooperate to complete a large task, it is not convenient to share files between users.
- Multi-level directory structure: For a large file system, a three-level or higher directory structure is used to improve directory search speed and file system performance. Multi-level directory structure is also called tree directory structure, the home directory is called the root directory, the data file is called the leaf, the other directories are as the nodes of the tree.
In the tree directory structure, you can create a UFD for yourself and create subdirectories. When creating a new file, you only need to check whether the UFD and its subdirectories have the same file name as the new file. If not, you can add a new directory entry to the UFD or one of its subdirectories. In the tree directory, how to delete a directory depends on the situation. If the directory to be deleted is empty, simply delete it so that the directory item in the directory above it is empty. If the directory is not empty, you can use the following method:Do not delete non-empty directories(When a directory is not empty, in order to delete a non-empty directory, all files in the directory must be deleted first, so that it is called empty directory, and then delete, if the directory contains subdirectories, you should recursively call the deletion),You can delete a non-empty directoryDelete all files and subdirectories in the directory at the same time.
Directory query technology
When a user to access an existing file, the system first provide the file name is used to analyse the directory query, find out the file file control block or corresponding index node, and then, according to the FCB files that recorded in the physical address or index nodes (dish), figure out files on disk physical location, in the end, through the disk driver again, Read the required file into memory. At present, the commonly used methods are linear search method and Hash method.
- Linear search is also called sequential search. In a tree directory, the file name provided by the user is a path name composed of multiple file component names. In this case, multi-level directories must be searched.
- Hash method: The system uses the file name provided by the user and converts it to the index value of the file directory, and then searches the file directory using the index value. This improves the search speed.
File sharing
File sharing enables multiple users (processes) to share the same file, and only one copy of the file is kept in the system. If the system does not provide sharing capabilities, each user who needs the file will have to have their own copy, resulting in a huge waste of storage space. With the development of computer technology, the scope of file sharing has developed from single-machine system to multi-machine system, and then expanded to the world through network. The sharing of these files is realized by distributed file system, remote file system and distributed information system. These systems allow multiple clients to share server files in the network through the C/S model.
Sharing based on indexed nodes (hard linking)
In this case, the file properties and the file address are no longer in the file directory, but in the index, which holds the file name and a pointer to the index. In Linux, such nodes are called Inode nodes. There is also a counter at the node of the index that counts how many users access the file.
When the count = 1, create a file of the user to delete files, otherwise cannot delete files, this is the rules of the system under this Shared file system, because if the users share files deleted file sharing, system and set up new files in a Shared location, so other users on a Shared file will access into other documents, The meaning of sharing files is lost.
Therefore, the user who creates a shared file cannot delete the file when it is no longer needed. Instead, the user just reduces the count of the file by 1 and deletes the corresponding directory entry in his own directory. Other users can still use the shared file. If count is 0, no user is using the file and the system deletes the file. This leads to the next type of file sharing.
File Sharing using Symbolic Chain (Soft Link)
When the user needs access to a Shared file, generated by the system a link file, used to the Shared file directory, according to the link files to other users access to a Shared file, if owner to delete the file or modify the file, you need other users to regenerate the link file, the original link will fail.When symbolic chain is used to implement file sharing, only the owner of the file has the pointer to its index node. Other users who share the file have only the path name of the file and do not have a pointer to its index node. In this way, the file owner does not delete a shared file leaving a dangling pointer.
The symbolic chain approach has a great advantage, that is, network sharing only needs to provide the network address of the machine where the file resides and the file path in the machine. But there are still problems, such as: to adopt a file Shared symbols chain mode, when the file owner to delete it, and in the sharing of other users to use the symbolic link to access the file before, and others in the same directory to create another has the same name of the file, then the chain will still be valid, but access to the file has been changed, resulting in error.
File protection
To prevent file sharing from damaging files or modifying files by unauthorized users, the file system must control users’ access to files, that is, the permission to read, write, and execute files is resolved. Therefore, the corresponding file protection mechanism must be established in the file system.
File protection is implemented by password protection, encryption protection and access control. Among them, password protection and encryption protection are to prevent users’ files from being accessed or stolen by others, and access control is to control users’ access to files.
Access type
File protection can start by limiting the types of access to files. The main types of access that can be controlled are:
- Read: To read from a file.
- Write: Write to a file.
- Execute: Load the file into memory and execute.
- Add: Adds new information to the end of the file.
- Delete: Deletes a file to release space.
- List list: Lists file names and file properties.
Access control
The most common approach to access control is to control by user identity. The most common way to implement identiity-based Access is to add an access-control List (ACL) to each file and directory to specify each user name and the type of Access it allows.
The advantage of this approach is that you can use a complex approach. The disadvantage is that the length is unpredictable and can lead to complex space management, which can be solved by using a compact access list.
A compact access list uses owner, group, and the other three user types.
- Owner: the user who creates the file.
- Group: A group of users who need to share files and have similar access.
- Others: all other users in the system.
This only requires three fields to list the access permissions for each of the three categories of users in the access table. When creating a file, the file owner indicates the user name and group name of the file creator. When creating a file, the system also specifies the name of the file owner and group name in the FCB of the file. When a user accesses the file, the user can access the file based on the permissions of the owner. If the user and the owner are in the same user group, the user can access the file based on the permissions of other users. UNIX operating systems use this method.
Passwords and passwords are the other two methods of access control.
The password refers to a password provided by a user when creating a file. The system sends the password to other users who are allowed to share the file when creating the FCB. Users must provide passwords when requesting access. This method does not cost much time and space, but the disadvantage is that the password is directly stored in the system, which is not safe.
A password is used to encrypt a file. A key is required to access the file. This method has strong confidentiality and saves storage space, but it takes some time to encode and decode.
Passwords and passwords are used to prevent user files from being accessed or stolen by others. They do not control the type of file access.