The code is on Github.
The experiment involved modifying the file system to support larger files and symbolic links, and the experiment itself was not very complicated. But the file system is arguably the most complex part of XV6. The entire file system consists of seven layers: file descriptors, pathnames, directories, inodes, logs, buffers, and disks.
File descriptors are similar to Linux, abstracting files, pipes, devices, sockets, and so on into file descriptors that can be read and written using read and write system calls. Read and write in XV6 use if-else to determine the descriptor type and select the corresponding underlying function. In Linux, function Pointers are used to point directly to the underlying function, avoiding multiple judgments.
The pathname provides the ability to find files from the directory system based on the pathname. Avoid deadlocks that may occur during pathfinding, such as pathnames containing.. .
The directory layer is similar to a file. The directory entry struct Dirent of the directory is stored inside the directory file, which contains the file name and the corresponding inode number. Directory lookup in XV is performed by traversing an array of directory entries for sequential comparisons in O(n) time; In file systems such as NTFS and ZFS, the disk balance tree is used to organize directory entries, reducing the complexity of directory search to O(LGN).
The inode layer is the organization of files on the disk. An area in the disk is used to store inode information, including the file type, size, number of links, and the disk block number corresponding to each block of the file. The inode number can be obtained from the directory system through the path, and the inode information can be read from the disk. The disk block number can be found based on the offset, and then the disk block number can be read and written.
The logging layer provides transaction and fail-over capabilities, which are used when multiple disk operations must be performed atomically (deleting files from directories, deleting inodes corresponding to files, modifying free block bitmaps, etc.). The log writes operations to the log area of the disk, writes commit after completion, and finally writes all operations to disk. When a failure occurs before writing commit, no operation is required because the transaction has not been committed; When a failure occurs after a COMMIT is written, the entire log in the log area is rewritten to ensure that the transaction is committed correctly.
The buffer provides the cache of disk blocks and ensures that there is only one disk block in the buffer. In this way, only one thread can operate on the same block at a time to avoid data inconsistency.
Large files (moderate)
The experiment was to make XV6 support larger files. The file block number dinode.addr in the original XV6 was used with a direct block table of size 12 and a level 1 block table of size 256, i.e., a maximum file size of 12+256 blocks. The system can support files of 11+256+256*256 blocks by replacing an item in a direct block table with a secondary block table.
First modify the corresponding macro and inode definition.
#define NDIRECT 11
#define NINDIRECT (BSIZE / sizeof(uint))
#define MAXFILE (NDIRECT + NINDIRECT + NINDIRECT * NINDIRECT)
struct dinode {. uint addrs[NDIRECT+2]; // Data block addresses
};
struct inode {. uint addrs[NDIRECT+2]; // Data block addresses
};
Copy the code
The BMAP function was then modified to support a secondary block table, essentially repeating the block table query process once.
static uint
bmap(struct inode *ip, uint bn)
{... bn -= NINDIRECT;if(bn < NINDIRECT * NINDIRECT){
// double indirect
int idx = bn / NINDIRECT;
int off = bn % NINDIRECT;
if((addr = ip->addrs[NDIRECT + 1= =])0)
ip->addrs[NDIRECT + 1] = addr = balloc(ip->dev);
bp = bread(ip->dev, addr);
a = (uint*)bp->data;
if((addr = a[idx]) == 0){
a[idx] = addr = balloc(ip->dev);
log_write(bp);
}
brelse(bp);
bp = bread(ip->dev, addr);
a = (uint*)bp->data;
if((addr = a[off]) == 0){
a[off] = addr = balloc(ip->dev);
log_write(bp);
}
brelse(bp);
return addr;
}
panic("bmap: out of range");
}
Copy the code
Finally, modify itrunc function to release the block corresponding to the level 2 block table, mainly pay attention to brelse call, just like the level 1 block table processing.
void
itrunc(struct inode *ip)
{...if(ip->addrs[NDIRECT + 1]){
bp = bread(ip->dev, ip->addrs[NDIRECT + 1]);
a = (uint*)bp->data;
struct buf *bpd;
uint* b;
for(j = 0; j < NINDIRECT; j++){
if(a[j]){
bpd = bread(ip->dev, a[j]);
b = (uint*)bpd->data;
for(int k = 0; k < NINDIRECT; k++){
if(b[k])
bfree(ip->dev, b[k]);
}
brelse(bpd);
bfree(ip->dev, a[j]);
}
}
brelse(bp);
bfree(ip->dev, ip->addrs[NDIRECT + 1]);
ip->addrs[NDIRECT + 1] = 0;
}
ip->size = 0;
iupdate(ip);
}
Copy the code
Symbolic links (moderate)
This experiment is to realize symbolic link, symbolic link is to save the path name pointing to the file in the file, when opening the file according to the saved path name to find the actual file. The opposite of a symbolic link is a hard link, which points the file’s inode to the target file’s inode and increments the reference count by one.
Symlink system calls are also simple to implement. Create an inode, set type to T_SYMLINK, and write the path to the target file to the inode.
uint64
sys_symlink(void)
{
char target[MAXPATH];
memset(target, 0.sizeof(target));
char path[MAXPATH];
if(argstr(0, target, MAXPATH) < 0 || argstr(1, path, MAXPATH) < 0) {return - 1;
}
struct inode *ip;
begin_op();
if((ip = create(path, T_SYMLINK, 0.0)) = =0){
end_op();
return - 1;
}
if(writei(ip, 0, (uint64)target, 0, MAXPATH) ! = MAXPATH){// panic("symlink write failed");
return - 1;
}
iunlockput(ip);
end_op();
return 0;
}
Copy the code
When the mode is not O_NOFOLLOW, the symbolic link is looping until the actual file is found. If the looping exceeds a certain number of times (10), the circular link may have occurred and -1 is returned. The main thing to note here is that namei does not lock IP. You need to use ilock to lock IP, whereas create does.
uint64
sys_open(void)
{...if(ip->type == T_DEVICE && (ip->major < 0 || ip->major >= NDEV)){
...
}
if(ip->type == T_SYMLINK){
if(! (omode & O_NOFOLLOW)){int cycle = 0;
char target[MAXPATH];
while(ip->type == T_SYMLINK){
if(cycle == 10){
iunlockput(ip);
end_op();
return - 1; // max cycle
}
cycle++;
memset(target, 0.sizeof(target));
readi(ip, 0, (uint64)target, 0, MAXPATH);
iunlockput(ip);
if((ip = namei(target)) == 0){
end_op();
return - 1; // target not exist} ilock(ip); }}}if((f = filealloc()) == 0 || (fd = fdalloc(f)) < 0){
...
}
Copy the code