A technical article if it is only theoretically speaking, but can not pull out of their own things, then it is good to write, also can only calculate on paper. After the last article “Inside the Shell pipe” received a lot of readers and fans like, we will implement the pipe function. For example, we will support the following complex instructions, a series of instructions with many pipe sleeve strings.

$ cmd1 | cmd2 | cmd3 | cmd4 | cmd5
Copy the code

We’re going to use Python, because the Go and Java languages don’t support forks. What we ultimately need is this diagram, which is pretty simple, but it takes a lot of work to construct this diagram.

The file name of the program is pipe.py, and the program runs as follows

python pipe.py "cat pipe.py | grep def | wc -l"
Copy the code

Count the number of def words in the pipe.py file code, output

3
Copy the code

Instruction execution

Each instruction must carry at least one pipe, the left pipe or the right pipe. The first instruction and the last instruction have only one pipe, and the middle instruction has two pipes. A pipe is identified by its pair of read and write descriptors (R, W).

The read descriptor left_PIPE [0] for the left pipe is the standard input for the docking process. The write descriptor right_pipe[1] for the right pipe is the standard output for the docking process. After adjusting the descriptors, you can use the exec function to execute the instructions.

def run_cmd(cmd, left_pipe, right_pipe):
    if left_pipe:
        os.dup2(left_pipe[0], sys.stdin.fileno())
        os.close(left_pipe[0])
        os.close(left_pipe[1])
    if right_pipe:
        os.dup2(right_pipe[1], sys.stdout.fileno())
        os.close(right_pipe[0])
        os.close(right_pipe[1])
    Split instruction parameters
    args = [arg.strip() for arg in cmd.split()]
    args = [arg for arg in args if arg]
    try:
        Pass the instruction name, the instruction parameter array
        The first argument to the instruction parameter array is the instruction name
        os.execvp(args[0], args)
    except OSError as ex:
        print "exec error:", ex
Copy the code

Process relationship

When the shell needs to run multiple processes, it must use fork to create child processes and then use them to execute instructions.

# list of instructions and the pipe to the left of the next instruction as parameters
def run_cmds(cmds, left_pipe):
    Fetch the first instruction in the string, about to execute the first instruction
    cur_cmd = cmds[0]
    other_cmds = cmds[1:]
    # create pipe
    pipe_fds = ()
    if other_cmds:
        pipe_fds = os.pipe()
    Create a child process
    pid = os.fork()
    if pid < 0:
        print "fork process failed"
        return
    if pid > 0:
        # Parent process to execute instructions
        # pass both left and right pipes (possibly empty)
        run_cmd(cur_cmd, left_pipe, pipe_fds)
    elif other_cmds:
        Do not forget to close descriptors that are no longer in use
        if left_pipe:
            os.close(left_pipe[0])
            os.close(left_pipe[1])
        The child process recursively continues to execute subsequent instructions, carrying the newly created pipe
        run_cmds(other_cmds, pipe_fds)

Copy the code

The startup script

The command line parameters need to be divided according to the vertical line to obtain multiple instructions, and then enter the recursive execution

def main(cmdtext):
    cmds = [cmd.strip() for cmd in cmdtext.split("|")]
    # There is no pipe left of the first instruction
    run_cmds(cmds, ())
    
if __name__ == '__main__':
    main(argv[1])
Copy the code

Observe process relationships

Because several of the instructions in the example are too short to observe the process relationship through the ps command. So we add a debugging output code in the code, output the current process execution instruction name, process number and parent process number.

def run_cmd(cmd, left_pipe, right_pipe):
   print cmd, os.getpid(), os.getppid()
   ...
Copy the code

Observe the output as you run the script

$ python pipe.py "cat pipe.py | grep def | wc -l"
cat pipe.py 49782 4503
grep def 49783 49782
wc -l49784, 49783, 3Copy the code

The parent-child relationship is obvious from the output, with the NTH instruction process being the parent of the N+1 instruction process. In run_cmds, the parent process is responsible for executing the current instruction after the child process is fork out, and the remaining instructions are handed over to the child process. Hence the formation of the above process relationship. The reader can try to adjust the order of interaction execution so that the child process is responsible for executing the current instruction, and then watch the output

$ python pipe.py "cat pipe.py | grep def | wc -l"
cat pipe.py 49949 49948
grep def 49950 49948
wc -l49951, 49948, 3Copy the code

You’ll notice that all three command processes share the same parent, which is a Python process. As shown in the figure above, the process relationship formed by the shell when executing instructions is in this form, which looks clearer in logical structure.

Need the above complete source code, please pay attention to the following public number, in which reply “pipe” to get the source code.

To read more in-depth technical articles, scan the QR code above to follow the wechat public account “Code Hole”.