This is the third and final installment of the process, which covers a bit more class capacity, so it’s called a hodgepodge. This article addresses the remaining issues: loading the application, creating the first process, waking up the process from sleep, waiting to exit, and locking synchronization. Finally, we have a brief talk about the runtime, without further discussion, one by one.
loader
Use the double Act (
Creating a process that executes the same program as the parent process is usually not what we want. What we want is for the child process to execute a different program, which requires learning another skill called Shape-shifter (
)
Execexecexec is responsible for loading files from disk into memory and creating new memory images, a process that is both simple and complex. Because the elfelfelf file contains all the information about the loadable segment: how big the segment is and where to load it, all execexec needs to do is read the data from disk and move it to the appropriate location. The complexity lies in the complexity of the process. Let’s go through it step by step:
int exec(char *path, char **argv);
Copy the code
There are two arguments: pathPathPath is the file path, and argvargvargv is a pointer to an array of strings
if((ip = namei(path)) == 0) {// Get the inode of the file
end_op();
cprintf("exec: fail\n");
return - 1;
}
ilock(ip); // Lock the inode to make the data valid
Copy the code
This step is to parse the path and retrieve the file’s inodeinodeinode, return it on failure, lock the inodeinodeinode on success, and ilockilockilock makes inodeinodeinode data valid. See the inodeinodeinode section in the previous file system series on inodes
if(readi(ip, (char*)&elf, 0.sizeof(elf)) ! =sizeof(elf)) // Read elf headers
goto bad;
if(elf.magic ! = ELF_MAGIC)// Determine the file type
goto bad;
//#define ELF_MAGIC 0x464C457FU // "\x7FELF" in little endian
Copy the code
The first four bytes of elfelfelf are 0x7F, ‘E’, ‘L’, and ‘F’ respectively. If the first four bytes of a file are these four things, it is an Elfelfelf file.
if((pgdir = setupkvm()) == 0)
goto bad;
Copy the code
Create a new page table for the kernel part of the old process, create a new page table of its own, that is, create a new virtual address space.
The next step is to read the loadable segment and “move” to the new address space. Currently, the virtual address space is only mapped to the kernel part, and the user part is not mapped to the actual physical memory. To move data to the user part, there must be a process of allocating physical memory, and then mapping the virtual memory of the user part to the allocated physical memory. This is the allocuvmallocuvmallocuvm function.
Allocating virtual memory
int allocuvm(pde_t *pgdir, uint oldsz, uint newsz);
Copy the code
Allocuvmallocuvmallocuvm allocates newsz−oldsznewsz – oldsznewsz−oldsz virtual memory in the virtual address space pointed to by pgDIRPGdirPGdir, returning newsznewsznewsz. Allocating virtual memory is the two steps described above: allocating physical memory, filling in the page table to create the mapping, and let’s see how it works:
a = PGROUNDUP(oldsz); / / 4 k alignment
for(; a < newsz; a += PGSIZE){
mem = kalloc(); // Allocate physical memory
if(mem == 0) {// If the assignment fails
cprintf("allocuvm out of memory\n");
deallocuvm(pgdir, newsz, oldsz); // Recycle newsz to oldsz
return 0;
}
memset(mem, 0, PGSIZE); // Clear the data in memory
if(mappages(pgdir, (char*)a, PGSIZE, V2P(mem), PTE_W|PTE_U) < 0) {// Fill in the page entry to create the mapping
cprintf("allocuvm out of memory (2)\n"); // If the mapping fails to be established
deallocuvm(pgdir, newsz, oldsz); // Recycle newsz to oldsz
kfree(mem); // Reclaim the allocated MEM space
return 0; }}return newsz;
Copy the code
The whole process is the two step, that I said if one which step go wrong, just call deallocuvmdeallocuvmdeallocuvm function will be allocated space recycling, Deallocuvmdeallocuvmdeallocuvm is completely allocuvmallocuvmallocuvm inverse operation, recovery room after the release page table entries, there is no longer here.
Because the virtual addresses of programs in memory start at 0, the sizesizesize value is not only the size of the program, but also the address at the end of the program, The allocuvmallocuvmallocuvm allocates virtual memory following the current program’s oldsizeoldsizeoldsize, so the allocuvmallocuvmallocuvm function does as follows:
Move the data
int loaduvm(pde_t *pgdir, char *addr, struct inode *ip, uint offset, uint sz){
if((uint) addr % PGSIZE ! =0) // The address must be aligned
panic("loaduvm: addr must be page aligned");
for(i = 0; i < sz; i += PGSIZE){
if((pte = walkpgdir(pgdir, addr+i, 0)) = =0) //addr+ I Indicates the page entry address of the address page
panic("loaduvm: address should exist");
pa = PTE_ADDR(*pte); // The physical address recorded by the page entry
if(sz - i < PGSIZE) // Determine the number of bytes that can be transported at one time
n = sz - i;
else
n = PGSIZE;
if(readi(ip, P2V(pa), offset+i, n) ! = n)// Call readi to move data to address P2V(pa)
return - 1;
}
return 0;
}
Copy the code
Moving data from disk to memory encapsulates the readireadireadi function, which reads the data to the appropriate location according to the file inodeinodeinode. See the inode, Directory, and Path section for more information about the readireadireadi function.
With that in mind, go back to the execexec function:
sz = 0;
for(i=0, off=elf.phoff; i<elf.phnum; i++, off+=sizeof(ph)){
if(readi(ip, (char*)&ph, off, sizeof(ph)) ! =sizeof(ph)) // Read the program header
goto bad;
if(ph.type ! = ELF_PROG_LOAD)// If it is loadable segment
continue;
if(ph.memsz < ph.filesz) //memsz should not be less than filesz
goto bad;
if(ph.vaddr + ph.memsz < ph.vaddr) //memsz should not be negative either
goto bad;
if((sz = allocuvm(pgdir, sz, ph.vaddr + ph.memsz)) == 0) // Allocate virtual memory between sz and ph.vaddr + ph.memsz
goto bad;
if(ph.vaddr % PGSIZE ! =0) // The address should be aligned
goto bad;
if(loaduvm(pgdir, (char*)ph.vaddr, ip, ph.off, ph.filesz) < 0) // Move filesz bytes of data from disk to vaddr
goto bad;
}
Copy the code
This code is the process of loading the program, the whole process should be very clear after the above preparation, if not, may not be familiar with the Elfelfelf file, I use the double body transformation to create a new process described in the article about elfelfelf file, this article will not repeat. Just to mention one important point: Filesz
After loading the program from disk into memory, it’s time to allocate resources for the process to prepare the environment to run.
Assign user stack
sz = PGROUNDUP(sz); / / the size alignment
if((sz = allocuvm(pgdir, sz, sz + 2*PGSIZE)) == 0) // Allocate two pages of virtual memory
goto bad;
clearpteu(pgdir, (char*)(sz - 2*PGSIZE)); // Clear the USER bit of the first page entry to make it inaccessible in USER mode
sp = sz;
Copy the code
First, round up SZSZSZ, the process in memory image of each segment is 4K4K4K aligned, stack segment is no exception, so in fact, each segment is not next to each other, if not just aligned, then there is a gap between.
Allocuvmallocuvmallocuvm is called to allocate two pages of virtual memory, with the “high page” as the stack, and the “low page” as the protected page because the stack is scale-down. The so-called protected page is the corresponding page entry USERUSERUSER attribute position 0, after 0 user mode cannot access the page.
Assign the current program size to SPSPSP. SPSPSP is the future stack pointer. The memory image of the current program is shown below:
To prepare parameters
The program to run starts with the mainmainmain function, which takes two arguments:
- Argcargcargc is the number of arguments
- Argvargvargv a pointer to an array of strings
In this section we need to “move” the two arguments to mainmainMain and the strings that argvargvargv points to onto the stack. Because mainmainmain function is also a function, it is a function to follow the call convention, mainmainmain needs to push the parameters, let’s look at the parameter layout diagram:
Moving parameters is an interesting thing. Unlike moving data from disk to memory, which is shared by each process, process A moves data from disk to process B’s address space, no problem. A readireadireadi function does that. However, the transport parameters are different. The parameters are in the address space of process AAA and the destination is in the address space of process BBB. The two Spaces cannot communicate with each other and cannot be directly transported.
How say? The natural idea is to find a shared and interworking place to transfer between the two. This place is the kernel, and use the kernel as a hub to see how the related functions are implemented:
char* uva2ka(pde_t *pgdir, char *uva)
{
pte_t *pte;
pte = walkpgdir(pgdir, uva, 0);
if((*pte & PTE_P) == 0)
return 0;
if((*pte & PTE_U) == 0)
return 0;
return (char*)P2V(PTE_ADDR(*pte));
}
Copy the code
This function converts the user’s virtual address to the kernel’s virtual address. But it is, and to understand this, it is important to understand how xv6XV6xv6 memory is managed. Take a look at this picture again:
The kernel part (not all) maps to the 0− phystop0-phystop0 −PHYSTOP of physical memory, and the user part maps to the free part of physical memory, so the free part of physical memory maps to both the user part and the kernel part. Two user partial virtual addresses with the same value correspond to different physical addresses, but two kernel partial virtual addresses with the same value do correspond to the same physical addresses.
KERNBASE(0x80000000)KERNBASE(0x8000 0000)KERNBASE(0x80000000) KERNBASE(0x80000000) Minus kernBase kernBase is the physical address value. Instead, the physical address plus KERNBASEKERNBASEKERNBASE is its virtual address in the kernel
However, there is no direct correspondence between the virtual address of the user part, and the address translation must refer to the page table. The virtual address of the user part is converted into a physical address, and then baseKerNBasekernBase is converted into the virtual address of the kernel. The above function does just that.
User virtual address and kernel virtual address after conversion can be carried out parameter handling, see:
int copyout(pde_t *pgdir, uint va, void *p, uint len)
{
char *buf, *pa0;
uint n, va0;
buf = (char*)p;
while(len > 0){
va0 = (uint)PGROUNDDOWN(va);
pa0 = uva2ka(pgdir, (char*)va0); // Va0 kernel address under pgDIR mapping
if(pa0 == 0)
return - 1;
n = PGSIZE - (va - va0); // The number of bytes carried at a time
if(n > len)
n = len;
memmove(pa0 + (va - va0), buf, n); // Carry n bytes from buf
len -= n;
buf += n;
va = va0 + PGSIZE;
}
return 0;
}
Copy the code
For example, now I want to carry N bytes from the address of user part of process AAA to the address of user part of process BBB, va_b_uva__b \_uva_b_u.
- Call uVA2kauVA2kauVA2ka to convert va_b_uva\ _B \_uva_b_u to the kernel virtual address va_b_kva\ _B \_kva_b_k
- Call memmovememmovememmove to move n bytes from va_A_uva \ _A \_uva_a_u to va_b_kva\ _B \ _kva_b_K
It seems to move the data to the kernel va_b_kva\ _B \_kva_b_k, but va_b_Kvva \ _B \_kva_b_k corresponds to va_b_uva_b_u, So logically it does move the data from Va_A_uva \ _A \ _uva_A_u to vA_b_uva__b \_uva_b_u
With that in mind, let’s look at the execexec function:
uint ustack[3+MAXARG+1];
//#define MAXARG 32 // max exec arguments
for(argc = 0; argv[argc]; argc++) {
if(argc >= MAXARG)
goto bad;
sp = (sp - (strlen(argv[argc]) + 1& ~))3; // Move the stack pointer down to prepare for the string, ensuring 4-byte alignment
if(copyout(pgdir, sp, argv[argc], strlen(argv[argc]) + 1) < 0) // Move the string
goto bad;
ustack[3+argc] = sp; // Record the address of the string in the stack
}
Copy the code
This part moves the string to the new user empty stack. It is a string, not a parameter. The stack pointer is moved down, leaving space for the string, which starts with a 444 byte alignment. Log the address in ustackustackustack.
ustack[3+argc] = 0;
ustack[0] = 0xffffffff; // False return address
ustack[1] = argc; / / parameters
ustack[2] = sp - (argc+1) *4; // argv pointer argument
sp -= (3+argc+1) * 4; // The stack pointer moves down
if(copyout(pgdir, sp, ustack, (3+argc+1) *4) < 0) // Move the string address to the new user space stack
goto bad;
Copy the code
There is nothing left to say about the transport parameters and the address of the string.
Back to the execexec function:
for(last=s=path; *s; s++) // Parse the program name
if(*s == '/')
last = s+1;
safestrcpy(curproc->name, last, sizeof(curproc->name)); //strncpy
Copy the code
This part changes the process name according to the parameter path of execexecexec, which is a combination of file names and the last item is the name of the program.
oldpgdir = curproc->pgdir; // Old page directory
curproc->pgdir = pgdir; // Change the page directory
curproc->sz = sz; // Change the program size
curproc->tf->eip = elf.entry; // Set the entry point for execution
curproc->tf->esp = sp; // New user top of stack
switchuvm(curproc); // Switch the page table
freevm(oldpgdir); // Free up old user space
return 0;
Copy the code
This part modifies some attributes of the process task structure and changes eipeipeIP in the stack frame to the entry point of Elfelfelf, where the process will start execution when scheduled.
Freevmfreevmfreevm used to release the user space mapping all physical memory and memory page table itself, to see how to implement:
void freevm(pde_t *pgdir)
{
uint i;
if(pgdir == 0)
panic("freevm: no pgdir");
deallocuvm(pgdir, KERNBASE, 0); // Free Physical memory for user space mapping
for(i = 0; i < NPDENTRIES; i++){ // Free Physical memory occupied by the page table
if(pgdir[i] & PTE_P){
char * v = P2V(PTE_ADDR(pgdir[i]));
kfree(v);
}
}
kfree((char*)pgdir); // Free Physical memory occupied by the page directory
}
Copy the code
The process is clear, and the functions involved are described earlier, in three steps, noted in the comments. This section removes the original memory image of the process and replaces it with the new image.
return
Execexecexec is a system call. The flow of a system call is shown in the diagram above, using writeWriteWrite as an example. Here’s a look:
The basic flow of each system call is similar, as shown in the figure above. Here we focus on the return part. When the kernel function (the function in which the system call actually works) completes its execution, it goes to the kernel stack to get the return address: Traprettraprettrapret Specifies the address of the interrupt exit function. This function is executed to interrupt the exit operation. The last instruction is iretiret, after which the user state is returned.
But execexec is different. For example, some function in the process memory image AAA calls execexec and then goes into the kernel to execute the actual function SYS_EXECsys \ _EXECsys_exec. Execsys \_execsys_exec creates a new memory image BBB. After executing sys_EXECsys \_execsys_exec, the same address is returned to the kernel stack: Traprettraprettrapret, however, does not return the old memory image AAA, but the new memory image BBB, because AAA is gone, for the following reasons:
- User level context trapframetrapframetrapframe eipeipeip has been amended as elfelfelf entry point
- The user stack pointer has been modified to the top of the new user stack
- Switch to the new page table, the old page table has been released, the page table can be regarded as a virtual address space.
Take a look at the process diagram for the Execexec system call:
Here’s a flowchart of what Execexec actually does:
Ok, so that’s the end of the execexec. this line should be pretty clear.
Create the first process
Have created earlier to load in the ordinary course of process and bedding, creating the first process is very simple, relevant code in proc. C/userinitproc. C/userinitproc. C/userinit. The first process is like a fork and exec in combination. Since it is the first, there is no memory image clone of the parent process, so execexec is called to load initinitinit. Xv6xv6xv6 actually has a slightly different situation, which first loads an initCodeinitcodeinitCode program, which then calls execexec to load initinitinit program. Look at the actual source code:
void userinit(void)
{
struct proc *p;
extern char _binary_initcode_start[], _binary_initcode_size[];
p = allocproc(); // Assign a task structure to reserve context space
initproc = p;
if((p->pgdir = setupkvm()) == 0) // Create the kernel part of the page table
panic("userinit: out of memory?");
inituvm(p->pgdir, _binary_initcode_start, (int)_binary_initcode_size); // Initialize the virtual address space and load initcode into it
p->sz = PGSIZE;
Copy the code
This part assigns the task structure, then calls inituvminituvminituVM to initialize the virtual address space, and installs the initCodeInitCodeInitCode program into it. Initcodeinitcodeinitcode is the initializer that the process executes when it returns to user space, and to run in user space, it allocates memory in user space and loads the corresponding program into it, and that’s what inituvminituvminituVM does.
void inituvm(pde_t *pgdir, char *init, uint sz)
{
char *mem;
if(sz >= PGSIZE)
panic("inituvm: more than a page");
mem = kalloc(); // Allocate a page of physical memory
memset(mem, 0, PGSIZE); / / reset
mappages(pgdir, 0, PGSIZE, V2P(mem), PTE_W|PTE_U); // Map to virtual address space 0-4KB
memmove(mem, init, sz); // Move the initializer to be run to 0-4KB
}
Copy the code
Inituvminituvminituvm does the following:
- Allocate a page of physical memory to clear
- Map this page of physical memory to 0−4KB0-4KB0−4KB of the virtual address space
- Load the binary initializer into the 0−4KB0-4KB0−4KB area
In the execexecexec we are the loadable segment of the elfelfelf file that is loaded, and here we initialize initCodeInitCodeInitCode program is not compiled into the Elfelfelf executable when it is compiled, Instead, it is compiled into machine-code binary files with no extra information that can be loaded directly into memory and run, unlike elfelfelf files that can only load loadable parts and run. Initcodeinitcodeinitcode is also not a single program file, but is compiled into the entire kernel file along with something else, so just use memMovememMovememmove. See MakefileMakefileMakefile for more details.
Back to the userinitUserinitUserinit function:
p->sz = PGSIZE;
memset(p->tf, 0.sizeof(*p->tf));
p->tf->cs = (SEG_UCODE << 3) | DPL_USER;
p->tf->ds = (SEG_UDATA << 3) | DPL_USER;
p->tf->es = p->tf->ds;
p->tf->ss = p->tf->ds;
p->tf->eflags = FL_IF;
p->tf->esp = PGSIZE;
p->tf->eip = 0; // beginning of initcode.S
Copy the code
Reserved in allocprocallocprocallocproc interrupt stack frame space, filler content, here the code segment selector code segment selector for the user, the data segment selector for the user data segment selector and stack ESESES SSSSSS added several paragraphs, and so on are Shared data segment selector. Set the IFIFIF bit of eFLAGSeFlagseFlags in the stack frame to allow the response to interrupt when the interrupt exits by popping it into the eFLAGSeFlags register.
Instead of assigning a user stack to this temporary initializer, it uses the mapped 0−4KB0-4KB0−4KB high address directly, with 4KB4KB4KB as the bottom of the stack. As we will see later, the initializer is small enough to be used this way.
The eipeipeIP in the stack frame is set to 0, because the inituvminituvminituVM program can be seen from the previous, which moves the initCodeinitCodeinitCode to address 0. Different from elfelfelf file, it has a special entry point, which can be executed directly from the beginning, so eiPeipeIP is set to 0.
safestrcpy(p->name, "initcode".sizeof(p->name)); // Process name
p->cwd = namei("/"); // Work in the root directory
acquire(&ptable.lock);
p->state = RUNNABLE; // Change the status to RUNNABLE
release(&ptable.lock);
}
Copy the code
The last part is easy, just like forkfork, to set the name of the process, the path to work, and the state, not much to say.
The first process has just loaded initCodeInitCodeinitCode and is ready to execute.
The first process is scheduled to run on CPUCPUCPU and the first function that CPUCPUCPU executes is forkretforkretret, which we said would be an empty function if it were a normal function, but the first function would do the actual thing:
void forkret(void)
{
static int first = 1;
release(&ptable.lock);
if (first) {
first = 0;
iinit(ROOTDEV); // Initialize the inode
initlog(ROOTDEV); // Initialize logs to recover data from logs and maintain disk data consistency}}Copy the code
There are two main things to do:
- Initialize the inodeinodeinode, mainly the inodeinodeinode lock initialization
- Initialize logs and recover data from logs to ensure disk data consistency.
Forkretforkretforkret is a function that must be executed every time a new process is scheduled. It is not a function in initCodeInitCodeinitCodeInitCode.
Globl start start: argv pushl $init init pushl $0 Movl $SYS_exec, %eax #exec int $T_SYSCALL exit(); Movl $SYS_exit, %eax #exit int $T_SYSCALL #exit JMP # char init[] = "/init\0"; Init:.string "/init\0" # char *argv[] = {init, 0}; Prepare exec second argument string array.p2align 2 argv:.long init.long 0Copy the code
So initCodeInitcodeinitcode is a program that calls execexecexec to load initinitInit, and this code itself should be easy to understand, if not, Look back at the previous tutorial on execexecexec and how to implement the call.
Let’s just say that execexec will not return the exitexitexit function if it succeeds, for the reason covered in the execexec section. Finally, what does initinitinit look like? There are some things I haven’t talked about yet, so here’s the general flow:
int main(void)
{
int pid, wpid;
if(open("console", O_RDWR) < 0) {// Open the console
mknod("console".1.1);
open("console", O_RDWR);
}
dup(0); // stdout
dup(0); // stderr
for(;;) {printf(1."init: starting sh\n");
pid = fork(); //fork a child process
if(pid < 0) {printf(1."init: fork failed\n");
exit(a); }if(pid == 0) {// The child process runs shell
exec("sh", argv);
printf(1."init: exec sh failed\n");
exit(a); }while((wpid=wait()) >= 0&& wpid ! = pid)//wait()
printf(1."zombie! \n"); }}Copy the code
The initinitinit program does two things:
- Open the console file, and forkforkfork a child process to run shellshellshell
- Waitwaitwait child process, which includes orphan process
Ok, so much for the first process, let’s end with a general picture:
Dormancy and awakening
Dormancy and wake up are not complicated at all, even simple. The difficulty lies in synchronizing with locks. Let’s discuss the problem of locks later, and let’s look at the practical operation of dormancy and wake up first
void sleep(void *chan, struct spinlock *lk);
void wakeup(void *chan);
Copy the code
Enter the dormant
Sleepsleepsleep prototype:
void sleep(void *chan, struct spinlock *lk);
Copy the code
LKLKLK is the lock that manages the object but is not necessarily owned by Chanchanchan. For example, sleep(curproc, & table.lock), All task structures have a single lock to manage task structures, such as sleep(&log, &log.lock), which is unique to Logloglog.
Note that sleepsleepsleep is not the system call sleepsleepsleep, To avoid collisions, I’ll say sleep_syssleep_syssleep_sys (sys_sleepsys_sleep is an actual kernel function). The xv6xv6xv6 code has this problem, some user interfaces, kernel functions, and some functions in the file are named the same, but we have to treat them differently when we actually look at the code. Sleep_syssleep \ _syssleep_sys are implemented according to the sleepsleepsleep here, simple way, every time the clock ticks interrupt will increase, according to the number of ticks can identify the time to sleep, If not, keep calling SleepSleepSleep. The implementation principle is very simple, you can take a look, this article will not say more, here is mainly to see how specific dormancy is carried out.
void sleep(void *chan, struct spinlock *lk){
/ * * * * * * * * * just * * * * * * * * * * /
p->chan = chan; // Hibernate on chan
p->state = SLEEPING; // The state changes to SLEEPING
sched(); // Allocate CPU scheduling
p->chan = 0; // Change the sleep object to 0
/ * * * * * * * * * just * * * * * * * * * * /
}
Copy the code
The entire sleepSleepsleep function actually does that, setting the current process sleep object and state, then calling schedschedsched to let CPUCPUCPU go, A return from schedschedsched indicates that the current process has finished sleeping and is scheduled on CPUCPUCPU again. It is no longer sleeping so changing the sleep object to table 0 does not.
Out of sleep
static void wakeup1(void *chan)
{
struct proc *p;
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++)
if(p->state == SLEEPING && p->chan == chan) // Look for processes that are dormant and on chan
p->state = RUNNABLE; // Change its state to RUNNABLE
}
Copy the code
Wake up is easier. Query the task structure one by one, find the process whose state is SLEEPINGSLEEPINGSLEEPING and whose sleep object is chanchanchan, and change its state to RUNNABLERUNNABLERUNNABLE. Indicates that the process is awakened and can be scheduled on CPUCPUCPU again.
As you can see, going into and out of hibernation is simple enough that it doesn’t need much explanation, except to note that schedsched calls don’t return and instead execute other processes, which is the benefit of hibernation, increasing CPUCPUCPU utilization.
Wait and exit
The parent process waits for the child to exit. Both waitWaitWait and exitexitexit are used to release resources. Exitexitexit is called by the child process itself to release a portion of resources, but it can never be completely released. Since executing the exitexitexit function itself requires resources, such as the stack, exitexitExit can only free a portion of it, leaving the rest to the parent process to process by calling WaitWaitWait.
Child process exits
void exit(void){
struct proc *curproc = myproc();
if(curproc == initproc)
panic("init exiting");
Copy the code
The first process cannot exit and has special functions, such as reclaiming resources from orphaned processes.
for(fd = 0; fd < NOFILE; fd++){
if(curproc->ofile[fd]){
fileclose(curproc->ofile[fd]);
curproc->ofile[fd] = 0; }}Copy the code
This section closes all files and releases the file’s inodeinodeinode if the file’s link count and reference count are both 0. See the previous section on file system calls
begin_op();
iput(curproc->cwd); // Drop the inode of the current working path
end_op();
curproc->cwd = 0; // The current working path is set to 0
Copy the code
This section releases the inodeinodeinode of the current working path, which is a directory, and sets it to 0. In addition, operations involving disk read and write should use the log system to ensure atomic operations, file operations are not detailed, see the previous problems.
acquire(&ptable.lock); / / lock
wakeup1(curproc->parent); // Wake up the parent
Copy the code
The parent process may sleep because of the waitwaitwait call, which wakes up the lock problem later
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){ // Adopt the abandoned child to init
if(p->parent == curproc){
p->parent = initproc;
if(p->state == ZOMBIE) wakeup1(initproc); }}Copy the code
If the exiting process has child processes, they become orphan processes because the current process (their parent) is exiting and will not call WaitWaitWait to collect the process for them. Orphan processes also need to be handled by passing them all to initinitinit and waking up initinitinit for immediate processing if there are zombie processes.
curproc->state = ZOMBIE; // The status changes to zombie
sched(); // schedule, never return
panic("zombie exit"); // It is not possible to do this normally because it does not return
}
Copy the code
The parent process’s WaitWaitWait will handle the zombie process after schedsched because only RUNNABLERUNNABLERUNNABLE processes can be scheduled. The parent process’s waitWaitWait process does not exist.
Wait for the child process to exit
The waitWaitWait function basically looks for a ZOMBIEZOMBIEZOMBIE process, and when it finds one, it recykes its resources and looks at the code
for(;;) {//" infinite loop"
havekids = 0;
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){ // Loop to find the child process
if(p->parent ! = curproc)// If the parent of process P is not the current process
continue;
havekids = 1; // The current process has child processes
if(p->state == ZOMBIE){ // If the child is in ZOMBIE state, reclaim its resources
// Found one.
pid = p->pid;
kfree(p->kstack); // Reclaim the kernel stack
p->kstack = 0;
freevm(p->pgdir); // Reclaim user space and page table memory
p->pid = 0;
p->parent = 0;
p->name[0] = 0;
p->killed = 0;
p->state = UNUSED; // The state becomes UNUSED, indicating that the structure is idle
release(&ptable.lock); / / releases the lock
returnpid; }}Copy the code
If the current process has a child in ZOMBIEZOMBIEZOMBIE state, it recycles its resources
if(! havekids || curproc->killed){ release(&ptable.lock); return -1; }Copy the code
If the current process has no children or is killed, release the lock and return that the −1-1−1 table is faulty. Killkillkill and lock issues are discussed later
sleep(curproc, &ptable.lock); //DOC: wait-sleep
}
Copy the code
This part is if the current process has a child process, but the child process has not exited, the parent process sleep to wait for the child process to exit.
The waitwaitwait and exitexitExit functions are used to reclaim the child’s resources, as well as to adopt its child to initinitinit if the exiting process has a child.
With this in mind, the state change diagram should have a deeper understanding:
Exitexitexit is invoked by a process and then exits. Killkillkill is invoked by a process to force another process to exit by exitexitExit. Each task structure has a killedKilledKILLED attribute, which is set to 1 to indicate that the process was “killed”, but the killkillkill operation was not actually performed
int kill(int pid)
{
struct proc *p;
acquire(&ptable.lock); / / lock
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++){ // Loop to find the pid process
if(p->pid == pid){ / / found
p->killed = 1; 1 / / killed
// Wake process from sleep if necessary.
if(p->state == SLEEPING) // If the process is sleeping
p->state = RUNNABLE; / / wake
release(&ptable.lock); / / lock
return 0; // Return correct
}
}
release(&ptable.lock); / / lock
return - 1; // Return error
}
Copy the code
The killkillkill function simply sets killedKilledKILLED to 1 to wake up the process if it is sleeping. But killkillkill doesn’t actually kill the process — exitexitExit is called to make itexit. Where does itexit?
Each process is always interrupted, so check killedKilledKILLED in the general interrupt service program TrapTrapTrap. If killed== 1KILLED == 1KILLED ==1, exitExitExit is called to exit:
void trap(struct trapframe *tf){
/ * * * * * * * * * * just * * * * * * * * * /
if(myproc() && myproc()->killed && (tf->cs&3) == DPL_USER) // If killed
exit(a);/ / exit
if(myproc() && myproc()->state == RUNNING && // A clock interrupt occurred
tf->trapno == T_IRQ0+IRQ_TIMER)
yield(); // Give up the CPU
if(myproc() && myproc()->killed && (tf->cs&3) == DPL_USER) // Check again if killed
exit(a);/ / exit
}
Copy the code
Each process will always enter the kernel, and check the killedKilledKILLED value when leaving the kernel. If killedKilledKILLED is used, call exit. The reason why you checked the yieldyieldyield twice is because you could be killedKilledkilled either before or after the yieldyieldyield gave up CPUCPUCPU, and you checked the yieldyield twice to kill the process in real time, but you didn’t have to check it later because the process always has to go into the kernel again, By then KILLED == 1KILLED == 1KILLED ==1 will still be captured, but the timeliness may not be as good.
A panicpanicpanic will occur when an killed==1killed==1killed==1 killed==1 While panicpanicpanic. A panicpanicpanic is a process that prints a string of error messages and then freezes CPUCPUCPU(infinite loop). Kill (PID) Kill (PID) forces the process to call exit.
LOCK synchronization has always been one of the most complex problems in the operating system, xv6xv6xv6 LOCK design in the LOCK has been discussed in the article, xv6 LOCK design itself is not difficult, rare is the use of LOCK, here according to the process of this part of the use of LOCK to brief talk. The lock-related parts of the process are mainly sleep, wake up, wait, quit, schedule, switch, one by one.
Sleep wake up
Sleep is to sleep on an object, and wakeup is to wakeup a process that is sleeping on an object, so it is natural to declare sleepsleepsleep and wakeupwakeupwakeup:
void sleep(void *obj);
void wakeup(void *obj);
Copy the code
Is that the right statement? Let’s take a simple example of a variant producer consumer:
Send:
while(obj ! =NULL) // The object is still there
; / / loop
obj = 1; // Make objects
wakeup(obj); // Wake up the dormant consumers on it
Recv:
while(obj == NULL) // There are no objects to consume
sleep(obj); / / sleep
obj = NULL; // Consume objects
Copy the code
At first glance it may seem fine, but wakeupwakeupwakeup can cause a deadlock if it occurs before sleepsleepsleep:
If obj=NULLobj =NULLobj =NULL, RecvRecvRecv is ready to sleep, but the call to sleepSleepsleep is delayed because the call to sleepSleepSleep is interrupted. During this time SendSendSend is executed on the other CPUCPUCPU, setting obJobjobj to 1 and waking up RecvRecvRecv that is dormant on obJobjobj. And RecvRecvRecv, when the interrupt ends, it goes to sleep, so sleep is after wake up, sleep misses wake up, deadlocks.
Even if it is not an interrupt, because it is a multi-processor, the time between the two is uncertain, and the above situation is entirely possible. The only way to avoid this inconsistency is to lock the information in the second version:
Send:
lock(obj_lock);
while(obj ! =NULL) // The object is still there
; / / loop
obj = 1; // Make objects
wakeup(obj); // Wake up the dormant consumers on it
unlock(obj_lock);
Recv:
lock(obj_lock);
while(obj == NULL) // There are no objects to consume
sleep(obj); / / sleep
obj = NULL; // Consume objects
unlock(obj_lock);
Copy the code
So, in the same case above, RecvRecvRecv gets obj_lockOBj \ _lockOBJ_lock before preparing sleepSleepSleep, SendSendSend without obj_lockobj\ _lockOBJ_lock, it is impossible to wakeupwakeupwakeup, so sleep will not miss wakeup.
If RecvRecvRecv sleeps with obj_lockobj\_lockobj_lock, it will also be deadlocked.
Therefore, the interface to sleepSleepsleep is defined as:
void sleep(void *obj, void *obj_lock);
Copy the code
Use this lock as an argument to a sleepSleepsleep to release the lock when RecvRecvRecv goes to sleep and regain it when RecVRecV returns.
We need to “release the lock after it enters sleep”. If we release the lock before that, we may have sleepsleepmiss wakeupwakeupwakeup for the same reason as before.
But how is it possible to release the lock when it’s in hibernation? We have already seen that to go to sleep is to change the current process state to SLEEPINGSLEEPINGSLEEPING and give CPUCPUCPU away. How can CPUCPUCPU release the lock? So the lock must be released before entering hibernation, which contradicts the above. Therefore, the two steps of releasing the lock and entering the sleep state must be an atomic operation, that is, another lock.
Involved in the change of the process status, that is necessarily task structure of the array that lock ptable. Lockptable. Lockptable. Lock, use it to ensure releasing dormancy object lock and enter a dormant state these two steps is an atomic operation, but xv6xv6xv6 has a little different to its processing, For a bit more sophistication, check out the full sleepsleepsleep code:
void sleep(void *chan, struct spinlock *lk)
{
struct proc *p = myproc();
if(lk ! = &ptable.lock){// If lk is not table.lock
acquire(&ptable.lock); / / get ptable. Lock
release(lk); / / release lk
}
// Go to sleep.
p->chan = chan;
p->state = SLEEPING;
sched();
// Tidy up.
p->chan = 0;
// Reacquire original lock.
if(lk ! = &ptable.lock){// If lk is not table.lock
release(&ptable.lock); / / release ptable. Lock
acquire(lk); // Retrieve the LK lock}}Copy the code
Throughout the code run up and down very symmetrical, don’t consider first dormancy object lock is the task structure lock this kind of circumstance, can see clearly the release of dormancy object lock LKLKLK and enter a dormant state. These two steps in ptable lockptable. Lockptable. Lock the lock under the protection of, It’s an atomic operation, and after the above analysis, there won’t be any complications.
A sleepsleepsleep does not logically need to release the lock, but only temporarily to avoid deadlocks. The lock is usually acquired and released in pairs within the same function, so sleepSleepsleep has to retrieve the lock on its return.
Lk =ptble. Locklk =ptble. Locklk =ptble. In the waitWaitWait function, the parent waits for the child to exit, at which point it sleeps on itself:
sleep(curproc, &ptable.lock);
Copy the code
This kind of situation from the point of sleepsleepsleep function, do not release ptable. Lockptable. Lockptable. Lock, so not a deadlock? Xv6xv6xv6 Do as follows:
static void wakeup1(void *chan)
{
struct proc *p;
for(p = ptable.proc; p < &ptable.proc[NPROC]; p++)
if(p->state == SLEEPING && p->chan == chan)
p->state = RUNNABLE;
}
void wakeup(void *chan)
{
acquire(&ptable.lock);
wakeup1(chan);
release(&ptable.lock);
}
Copy the code
The wakeupwakeupwakeup function is divided into two parts, the actual action is wakeup1wakeup1wakeup1, but it does not need to lock, So you can go back and look at the wakeup1wakeup1wakeup1 actually used to wakeup the parent process in exitExitExit so it doesn’t cause a deadlock.
Scheduling switch
The scheduling switchover process is that process AAA switches to the scheduler. The scheduler selects a process BBB according to the polling algorithm, and then switches to process BBB.
Process AAA typically switches to the scheduler by calling schedsched, which is a wrapper for SWTCHSWTCHSWTCH, Before this must get ptable. Lockptable. Lockptable. Lock the lock to ensure that the process context. Yieldyieldyield for example yieldyieldyield is usually used for rescheduling when a time slice is up. The code is as follows:
void yield(void)
{
acquire(&ptable.lock); //DOC: yieldlock
myproc()->state = RUNNABLE;
sched();
release(&ptable.lock);
}
Copy the code
In the call SWTCHSWTCHSWTCH took ptable. Lockptable. Lockptable. Lock the lock, and the current state of the process change for RUNNABLERUNNABLERUNNABLE, After that, schedschedsched is called:
void sched(void)
{
int intena;
struct proc *p = myproc();
if(! holding(&ptable.lock)) panic("sched ptable.lock");
if(mycpu()->ncli ! =1)
panic("sched locks");
if(p->state == RUNNING)
panic("sched running");
if(readeflags()&FL_IF)
panic("sched interruptible");
intena = mycpu()->intena;
swtch(&p->context, mycpu()->scheduler);
mycpu()->intena = intena;
}
Copy the code
The schedschedsched function is called whenever a rescheduling is needed, doing a lot of checking, calling schedsched the current process state should not be RUNNINGRUNNINGRUNNING, Because Schedschedsched had changed the state earlier. Interrupts should not be allowed at this time because they were closed when the LOCK was picked xv6xv6xv6 to be safe, interrupts are closed when the LOCK is picked to avoid deadlocks as described in LOCK. Intenaintenaintena indicates that CPUCPUCPU allows interrupts before pushclipushclicli. This value is saved until the process is scheduled again, ensuring that interrupts are allowed consistently in the same process. Must be made to focus on why below SWTCHSWTCHSWTCH ptable. Lockptable. Lockptable. Lock the lock
Above this total will always find the whole process of switchswitchswitch hold ptable. Lockptable. Lockptable. Lock, want to know what it will switch to carry out SWTCHSWTCHSWTCH process, thus the lock access in a process, Release in another process. It’s unusual to use locks like this, but it’s a must.
If SWTCHSWTCHSWTCH execution process owns no ptable lockptable. Lockptable. The lock? For example, if the time slot for process AAA running on CPU0CPU0CPU0 is up, call yieldyieldyield and set process A’s process to RUNNABLERUNNABLERUNNABLE, CPU0CPU0CPU0 before executing schedsched, another CPU1CPU1CPU1 might schedule process AAA, so that a process running on two CPucpucpus would have its kernel stack context messed up, Therefore, SWTCHSWTCHSWTCH must hold the lock for the entire duration of execution to ensure that the state of the process is consistent with the context.
Thus, schedschedsched picks locks before and releases them, but the two operations are not paired, with process AAA picking locks and process BBB releasing locks. The statements after schedsched are the ones that are scheduled to be executed again on CPUCPUCPU. This usually starts with a lock release, even if the process is the first one. As we saw earlier, creating a new process simulates the new process as an old one. It fills its kernel station with all sorts of fake contexts, making it look like an old process that has been executed by CPUCPUCPU. It is necessary to emulate not only the context but also the behavior, so set Eipeipeip to the address of the forkretforkretret function when filling kernel-level context:
void forkret(void)
{
static int first = 1;
release(&ptable.lock);
if (first) {
first = 0; iinit(ROOTDEV); initlog(ROOTDEV); }}Copy the code
This function does one thing if the created process is not the first: Release ptable. Lockptable. Lockptable lock, this is the imitation of an old process, because the old SWTCHSWTCHSWTCH switching the process execution context is release ptable. After lockptable. Lockptable. Lock.
But there are also ones not released after schedsched, such as the last two lines in the exitexitexit code:
curproc->state = ZOMBIE;
sched();
panic("zombie exit");
Copy the code
Schedschedsched is scheduled in exitexitExit and never returns because the process has been destroyed. Is it not a mismatch between the lock and the lock? Notice again that schedschedsched picks locks before and after locks are not paired, process AAA picks locks, process BBB releases locks, and process AAA is gone. Okay? No problem.
The runtime
Finally, a brief word about the runtime. The runtime is an extension of the standard library. How does this relate to our process? The mainmainmain function is familiar, and there are two common questions about it: Does the program start with MainmainMain? Where does mainmainMain go after it executes? Both of these problems are related to the runtime. Mainmainmain is also a function, which is called to execute, and it has to prepare arguments for it before it executes, and then it has to clean up the program after it executes, which is done by the runtime.
The runtime will initialize the environment in which the program is running, parameters, heap, IOIOIO, threads, etc., and then it will call MainMainMain to execute the body of the program, and mainmainMain will execute the heap destruction and so forth and then exitExitExit, Let’s take a look at the minimalist runtime pseudo-code
Push argv # Call main # Call main # Push eax # Call exit # Call exitCopy the code
Argvargvargv, argcargcarGC, argvargvargv, argcargCARGC, argvargvargv, argcargCARGC, argvargvargv, argvargcarGC, argvargvargV, argvargcarGC, argvargvargV, argvargcarGC Calling exitexitExit as an argument to exitExitExit causes the process to exit. Returnreturnreturn at the end of the mainmainMain function is going to make the exitExitExit system call as an argument to exitExitExit, and the system will react differently depending on the value.
Xv6xv6xv6 does not implement the runtime, and the environment preparation for mainmainmain is implicitly included in the execexecexec function, which pushes the parameters of mainmainmain into the user stack. The user program needs to display exitExitExit after execution because it has no runtime to call exitExitExit for it. A run-time library as simple as the one above is not complicated to implement, but I won’t cover it here; I’ll write about it later.
The final summary
That’s about all we have to say about processes xv6xv6xv6. It’s pretty clear how processes are created, how the first process is created, how CPUCPUCPU is scheduled, what happens when CPUCPUCPU is first accessed, what is the meaning of program loading, How processes sleep and wake up, how zombie processes orphan processes are created, does the program start from MainmainMain, where does mainmainMain go after it ends, and the difficult issue of lock synchronization.
The above questions make sense to me, but I just stand on the xv6xv6xv6 code to understand it, but in fact why to design this way, what is the benefit of this design, can not design this way, my level is not enough, some of the questions are not very clear. The granularity of locks, the timing of switch interrupts, operations that may seem redundant but are actually critical to a multiprocessor design, such as switching kernel page tables, and so on.
Therefore, in fact, what I have told you is one-sided and simple. Although XV6XV6XV6 is small, it has all the five organs. It is very difficult for me to talk about design by jumping out of the original code.
Well, that’s all for this article. If you have any questions, please criticize and correct them. Welcome to exchange your learning progress with me.