Have you ever wondered what the Unix shell does when you execute a command on it? How does the shell understand and interpret these commands? What do you do behind the screen? For example, what does the shell do when we execute ls -l *.py? Knowing these, you can better use the Unix operating system, today we will take a look.
What is a shell
A shell is usually a command line interface that exposes the operating system’s services to human use or other programs. After the shell starts, the shell typically waits for input from the user by displaying a prompt. The following figure describes basic UNIX and Windows shell prompts.
So the shell prompts the user for a command. Now it’s time for the user to enter the command. So how does the shell take the commands the user enters and interpret them? To understand this, let’s break it down into four steps, which are:
- Gets and parses user input
- Identify the command and its parameters
- To find the command
- Execute the command
Now expand on:
1. Get and parse user input
For example, if you type ls -l *.py and press Enter, a function called getLine () declared in #include
is called inside the shell to read the user’s input. The user’s input string serves as the standard input stream. Once enter is pressed to indicate the end of a line, getLine () stores the input string into a buffer.
ssize_t getline(char **restrict lineptr, size_t *restrict n, FILE *restrict stream);
Copy the code
Function parameter description:
- Lineptr: buffer
- N: buffer size
- The standard input stream
Now let’s look at the code:
char *input_buffer;
size_t b_size;
b_size = 32; // size of the buffer
input_buffer = malloc(sizeof(char) * b_size); // the buffer to store the user input
getline(&input_buffer, &b_size, stdin); // gets the line and stores it in input_buffer
Copy the code
Once the user presses Enter, getLine () is called, storing the string or command entered by the user in input_buffer. So now that the shell has taken user input, what’s the next step?
2. Identify the command and its parameters
Now the shell knows that you typed the string ‘ls -l *.py’, but it also needs to know which of these is the command, which is the argument to the command, and who does that? That is the function strtok() “#include
“.
Strtok () marks a string as a delimiter, which in this case is a space. So a space tells strtok() that it is the end of a word. So the first tag or word in input_buffer is the command (ls), and the remaining words or tags (-l and *.py) are arguments to the command. So, once the shell marks strings, it stores them in a variable for later use.
char *strtok(char *restrict str, const char *restrict delim);
Copy the code
Parameter Description:
- STR: string to mark
- Delim: separator
The function strtok() takes a string and delimiter as arguments and returns a pointer to the token string. The specific execution code is as follows:
char *input_buffer, *args, *delim_args, *command_argv[50];
int i;
i = 0;
delim_args = " \t\r\n\v\f"; // the delimeters
args = strtok(input_buffer, delim_args); // stores the token inside args
while (args)
{
command_argv[i] = args; // stores the token in command_argv
args = strtok(NULL, delim_args);
i++;
}
command_argv[i] = NULL; // sets the last entity of command_argv to NULL
Copy the code
Command_argv holds the command string, which reads as follows:
command_argv[0] = "ls"
command_argv[1] = "-l"
command_argv[2] = "*.py"
command_argv[3] = NULL
Copy the code
Command_argv [0] is the command, the others are its arguments, and the last one is NULL, indicating the end of the command. Now that the command string is unassembled, the next step is to find the command.
3. Find commands
Step 2 already know that the user to execute the command is ls, so where to find this command? The shell goes back to the environment variable PATH, which stores the location of the executable command.
However, a PATH store can have more than one PATH:
How to find the ls command efficiently in so many paths? This requires the access() “#include
” function:
int access(const char *pathname, int mode);
Copy the code
Description of the parameters and return values:
- Pathname: path of the file/executable
- Mode: mode, we use X_OK to check whether the file exists
- Return value: 0 if the file exists, -1 otherwise
{
char *path_buff, *path_dup, *paths, *path_env_name, *path[50];
int i;
i = 0;
path_env_name = "PATH";
path_buff = getenv(path_env_name); /* get the variable of PATH environment */
path_dup = _strdup(path_buff); /* this function is found below */
paths = strtok(path_dup, ":"); /* tokenizes it */
while (paths)
{
path[i] = paths;
paths = strtok(NULL.":");
i++;
}
path[i] = NULL; /* terminates it with NULL */
}
/**
* _strdup - duplicates a string
* @from: the string to be duplicated
*
* Return: ponter to the duplicated string
*/
char *_strdup(char *from)
{
int i, len;
char *dup_str;
len = _strlen(from) + 1;
dup_str = malloc(sizeof(int) * len);
i = 0;
while(*(from + i) ! ='\ 0')
{
*(dup_str + i) = *(from + i);
i++;
}
*(dup_str + i) = '\ 0';
return (dup_str);
}
Copy the code
The PATH array in the above code stores all path positions and terminates with NULL. Therefore, you can connect each PATH location to a command and perform a presence check using the access() function:
{
char *command_file, *command_path, *path[50];
int i;
i = 0;
command_path = malloc(sizeof(char) * 50);
while(path[i] ! =NULL)
{
_strcat(path[i], command_file, command_path); /* this function is found below */
stat_f = access(command_path, X_OK); /* and checks if it exists */
if (stat_f == 0)
return (command_path); /* returns the concatenated string if found */
i++;
}
return NULL; /* otherwise returns NULL */
}
/**
* _strcat - concatenates two strings and saves it to a blank string
* @path: the path string
* @command: the command
* @command_path: the string to store the concatenation
*
* Return: Always void
*/
void _strcat(char *path, char *command, char *command_path)
{
int i, j;
i = 0;
j = 0;
while(*(path + i) ! ='\ 0')
{
*(command_path + i) = *(path + i);
i++;
}
*(command_path + i) = '/';
i++;
while(*(command + j) ! ='\ 0')
{
*(command_path + i) = *(command + j);
i++;
j++;
}
*(command_path + i) = '\ 0';
}
Copy the code
Once the command is found, the full path to the command is returned, otherwise NULL is returned, and the shell displays an error indicating that the command does not exist.
Now suppose the order is found, then what?
4. Run the command
Once the command is found, it’s time to execute it. The question is how?
Execve () execve() #include
int execve(const char *pathname, char *const argv[],
char *const envp[]);
Copy the code
Parameter Description:
- Pathname: indicates the full path of the executable file
- Argv: parameter of the command
- Envp: List of environment variables
Execve () executes the found command and returns an integer representing the result.
But now if the shell just runs execve(), you have a problem. Execve () does not return standard output information after the call, which is bad because the user needs to execute the result. So to solve this problem, the shell executes commands in the child process. Therefore, once execution is complete within the child process, the parent process receives the signal and the program flow continues. So to execute the command, the shell creates a child process using fork(). (Fork declaration in #include
)
pid_t fork(void);
Copy the code
Fork () creates a new process by copying the calling process. The new process is called a child process. The calling process is called the parent process. Fork () returns the child’s process ID in the parent and 0 in the child:
{
char *command, *command_argv[50], **env;
pid_t child_pid;
int status;
get_each_command_argv(command_argv, input_buffer); /* this function is found below */
child_pid = fork();
if (child_pid == - 1)
return (0);
if (child_pid == 0)
{
if (execve(command, command_argv, env) == - 1)
return (0);
}
else
wait(&status);
}
/** * get_each_command_argv - stores all the arguments \ * of the input command to the list * @command_argv: the command argument list * @input_buffer: the input buffer * * Return: Always void */
void get_each_command_argv(char **command_argv, char *input_buffer)
{
char *args, *delim_args;
int i;
delim_args = " \t\r\n\v\f";
args = strtok(input_buffer, delim_args);
i = 0;
while (args)
{
command_argv[i] = args;
args = strtok(NULL, delim_args);
i++;
}
command_argv[i] = NULL;
}
Copy the code
The shell uses wait() (the function declared in #include
) to wait for the state of the child process to change and again prompt the user before the program flow continues.
pid_t wait(int *wstatus);Copy the code
- Wstatus: is a pointer to an integer that identifies how the child process terminates.
The shell executes commands within the child process, and then waits () for the child process to complete. So the user can get the result of the command and type another command after the shell displays its prompt.
So finally the result of ls -l *.py is displayed when the child process completes, and since we have waited for the child process to finish, this means that the result of the command is given. So now the shell can display its prompt again and wait for user input again. This continues the loop unless the user types exit.
The last word
This article shares the process of shell parsing user input to the last command execution. If you find it helpful, please like, forward, and look at it. Thank you for your support.