• Killing a process and all of its descendants
  • Original article by Igor_Sarcevic
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: Jiang Wu Slag
  • Proofreader: TokenJan, PortandBridge

How do I kill a process and all of its children

Killing processes on UNIX-like systems is trickier than expected. Last week I was debugging a problem with terminating jobs in Semaphore. More specifically, this is a question about terminating a running process in a job. Here’s what I learned:

  • Unix-like operating systems have complex interprocess relationships: parent process, process group, session, and session leader process. However, the details are not uniform between Operating systems such as Linux and MacOS. Posix-compliant operating systems support sending signals to process groups using a negative PID.
  • It is not easy to use system calls to signal all processes in a session.
  • A child process started with exec inherits its parent’s signal configuration. For example, if the parent process ignores the SIGHUP signal, its child process will also ignore the SIGHUP signal.
  • The answer to the question “what happens within the orphan process group” is not simple.

Killing the parent does not kill the child at the same time

Each process has a parent process. We can use pstREE or the PS tool to see this.

#Start two virtual processes
$ sleep 100 &
$ sleep 101 &

$ pstree -p
init(1)-+
        |-bash(29051)-+-pstree(29251)
                      |-sleep(28919)
                      `-sleep(28964)

$ ps j -APPID PID PGID SID TTY TPGID STAT UID TIME COMMAND 0 1 1 1 ? -1 Ss 0 0:03 /sbin/init 29051 1470 1470 29051 pts/2 2386 SN 1000 0:00 sleep 100 29051 1538 1538 29051 pts/2 2386 SN 1000  0:00 sleep 101 29051 2386 2386 29051 pts/2 2386 R+ 1000 0:00 ps j -A 1 29051 29051 29051 pts/2 2386 Ss 1000 0:00 -bashCopy the code

Invoke the ps command to display PID (process ID) and PPID (parent process ID).

I had a false assumption about the relationship between the father-son process. I thought that if I killed the parent, I would also kill all of its children. But this is wrong. Instead, the children will become orphans, and the init process will become their parent again.

Let’s see what happens when we re-establish the parent-child relationship between processes by terminating the bash process (the current parent of the sleep command).

$ kill 29051 # Kill bash process

$ pstree -A
init(1)-+
        |-sleep(28919)
        `-sleep(28965)
Copy the code

It seems strange to me to reassign the parent process. For example, when I use SSH to log in to a server, start a process, and then exit, the process I started will be terminated. I mistakenly thought this was the default behavior on Linux. When I leave an SSH session, process termination is related to the process group, the lead process of the session, and the control terminal.

What are process groups and session lead processes?

Let’s look again at the output of the ps j command in the above example.

$ ps j -APPID PID PGID SID TTY TPGID STAT UID TIME COMMAND 0 1 1 1 ? -1 Ss 0 0:03 /sbin/init 29051 1470 1470 29051 pts/2 2386 SN 1000 0:00 sleep 100 29051 1538 1538 29051 pts/2 2386 SN 1000  0:00 sleep 101 29051 2386 2386 29051 pts/2 2386 R+ 1000 0:00 ps j -A 1 29051 29051 29051 pts/2 2386 Ss 1000 0:00 -bashCopy the code

In addition to the parent-child relationship represented by PPID and PID, there are two other relationships between processes:

  • A process group represented by a PGID
  • The session represented by the SID

We can see process groups in Shell environments that support job control, such as bash and ZSH, which create a process group for each pipe command. A process group is a collection of one or more processes (usually associated with a job) that can receive signals from the same terminal. Each process group has a unique process group ID.

#Start a process group consisting of the tail and grep commands
$ tail -f /var/log/syslog | grep "CRON" &

$ ps j
 PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
29051 19701 19701 29051 pts/2    19784 SN    1000   0:00 tail -f /var/log/syslog
29051 19702 19701 29051 pts/2    19784 SN    1000   0:00 grep CRON
29051 19784 19784 29051 pts/2    19784 R+    1000   0:00 ps j
29050 29051 29051 29051 pts/2    19784 Ss    1000   0:00 -bash
Copy the code

Notice that in the first half, the PGID of tail and grep are the same.

A session is a collection of process groups, usually consisting of a control terminal and a session leader process. If there is a control terminal in the session, it has a single foreground process group; except for that control terminal, all other process groups in the session are background process groups.

Not all bash processes are sessions, but when you use SSH to log in to a remote server, you usually get a session. When bash runs as the session lead process, it propagates the SIGHUP signal to its child processes. The way SIGHUP signals are transmitted is the core reason why I have always believed that child processes die along with parent processes.

Sessions are not implemented consistently in Unix

In this case, you can notice where the SID (the process’s session ID) appears. It is the ID shared by all processes in the session.

However, you need to keep in mind that not all Unix systems follow this implementation. The single UNIX specification only talks about “session lead processes” and has no “session IDS” similar to process ids or process group ids. The session leader process is a single process with a unique process ID, so the session ID we can talk about is the process ID of the session leader.

System V Release 4 introduced session ids.

In practice, this means that you can get the session ID on Linux by using the ps command, but on BSD and its variants (such as MacOS) the session ID does not exist or is always zero.

Kills all processes in a process group or session

We can use the PGID to signal the entire process group with the kill command:

$ kill -SIGTERM -- -19701
Copy the code

We signal the process group with a negative number -19701. If we pass a positive number, this number will be treated as the process ID and used to terminate the process. If we pass a negative number, it is treated as a PGID and used to terminate the entire process group.

Negative numbers come from the direct definition of the system call.

Killing all processes in a session is quite different. As we mentioned in the previous section, some systems have no concept of session ids. Even systems with session ids, such as Linux, do not provide system calls to terminate all processes in a session. You need to walk through the /proc output process tree, collect all the SIDs, and then terminate the process one by one.

Pgrep implements algorithms to iterate over, collect, and kill processes by session ID. Use the following command:

pkill -s <SID>
Copy the code

Signals ignored by Nohup are propagated to child processes

Ignored signals, like those ignored by Nohup, are propagated to all children of the process. This type of signal propagation was the ultimate bottleneck I encountered last week during the bug hunt.

My program is an agent for running bash commands, and what I verify in that program is that I’ve set up a bash session with a controlling terminal. This control terminal is the session lead process for the other startup processes in the bash session. My process tree looks like this:

agent -+
       +- bash (session leader) -+
                                 | - process1
                                 | - process2
Copy the code

I assume that when I kill a bash session with SIGHUP, its child process also terminates. The integration tests for the agents also prove this.

However, I ignored that this agent is started with NOhup. When you start a child process with exec, just as we started bash processes in an agent, it inherits the signal state from its parent.

This last conclusion took me by surprise.

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.