A few days ago, an online service that used Google Puppeteer to generate images exploded, leaving thousands of orphaned dead processes in each Docker container, as shown below.
This article is quite long, and it mainly talks about the following problems.
- When will zombie progression or orphan progression occur
- Puppeteer start-up process and online accident analysis
- What’s special about a process with PID 1
- Why should node/ NPM not be a process with PID 1 in the mirror
- Why can Bash be used as a PID 1 process, and what are its drawbacks as a PID 1 process
- What is the recommended action for the init process in the mirror
Puppeteer is a Node library that provides access to the Chrome API, enabling developers to start Chrome processes in their applications. Invoke JS API to realize page loading, data crawling, web automation testing and other functions.
The scenario used in this case is that the Puppeteer is used to load the HTML, followed by a screenshot that generates an image of the distribution poster. The article analyzes the reasons behind this problem, and then begins the formal content.
process
Each process has a unique identifier called PID. Pid is a non-negative integer value that can be viewed using the ps command. Running ps -ef on my Mac shows all the processes currently running, as shown below.
UID PID PPID C STIME TTY TIME CMD 0 1 0 0 6004 afternoon?? 23:09.18 /sbin/launchd 0 39 1 604 PM?? 0:49.66 /usr/sbin/syslogd 0 40 1 0 604 PM?? 0:13. 00 / usr/libexec/UserEventAgent (System)Copy the code
PID indicates the process ID.
Each process in the system has a parent process. The PPID in the ps output indicates the parent process NUMBER. The PID and PPID of the top-level process are 1 and 0 respectively.
Opening iTerm and executing a command such as “ls” on the terminal actually creates a new iTerm child, which in turn creates the ZSH child. If you enter the ls command in the ZSH command, the ZSH process starts another LS subprocess. The process relationship for entering ls in iTerm is shown below.
UID PID PPID C STIME TTY TIME CMD 501 321 1 0 6004 afternoon?? 61:01.45 / Applications/iTerm. App/Contents/MacOS/iTerm2 psn_0_81940 501 97920 321-0 8:02 morning ttys039 0:00. 7 / Applications/iTerm. App/Contents/MacOS/iTerm2 - server login - fp Arthur 0 97921 97920 0 8:02 morning ttys039 0:00. 03 login - fp Arthur 501 97922 97921 0 8:02 am TTYS039 0:00.29 - ZSH 501 98369 97922 0 8:14 am TTYS039 0:00.00./a.outCopy the code
Process and the fork
The aforementioned parent process “creates” the child process, which is more accurately described as a fork. For a practical example, create a new fork_demo.c file.
#include <unistd.h>
#include <stdio.h>
int main() {
int ret = fork();
if (ret) {
printf("enter if block\n");
} else {
printf("enter else block\n");
}
return 0;
}
Copy the code
Execute the code above and output the following statement.
enter if block
enter else block
Copy the code
You can see that both the if and else statements are executed.
The fork call
Fork is a system call whose method declaration is shown below.
pid_t fork(void);
Copy the code
After the fork call completes, a new child process is generated, and both parent and child processes continue executing from the fork return. A special note here is the meaning of the return value of fork, which is different in the parent and new child processes.
- The return value of fork in the parent process is the id of the newly created child process
- The return value of fork in the created child process is always equal to 0
Therefore, parent processes can be distinguished by the return value of fork, and the getPID method can be used to obtain the current process ID during running. The typical usage of fork is as follows.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("before fork, pid=%d\n", getpid());
pid_t childPid;
switch (childPid = fork()) {
case-1: {// fork failedprintf("fork error, %d\n", getpid());
exit(1);
}
case0: {// The child process code enters hereprintf("in child process, pid=%d\n", getpid());
break; } default: {// The parent process code enters hereprintf("in parent process, pid=%d, child pid=%d\n", getpid(), childPid);
break; }}return 0;
}
Copy the code
Execute the code above, and the output looks like the following.
before fork, pid=26070
in parent process, pid=26070, child pid=26071
in child process, pid=26071
Copy the code
The child process is a copy of the parent process. The child process has a copy of the parent process’s data space, heap and stack. Fork adopts copy-on-write technology, and the fork operation can be completed almost instantly. The actual copy is made only if the child process modifies the corresponding region.
Orphan process: cannot be born on the same day, nor die on the same day
Now I’m going to ask the question, when the parent process dies, does the child process die?
Imagine the reality of the scene, the father is gone, can the son still live? The answer is yes. For a process, when the parent exits, the children continue to run and do not die together.
A process whose parent has terminated is called an orphan process. The operating system is humanized. Orphan processes are taken over by process ID 1. The process with PID 1 will be covered later.
Next, modify the previous code slightly to cause the parent to fork the child and then commit suicide, generating an orphan process. The code is shown below.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
printf("before fork, pid=%d\n", getpid());
pid_t childPid;
switch (childPid = fork()) {
case1: {printf("fork error, %d\n", getpid());
exit(1);
}
case 0: {
printf("in child process, pid=%d\n", getpid()); sleep(100000); // child process sleep does not exitbreak;
}
default: {
printf("in parent process, pid=%d, child pid=%d\n", getpid(), childPid);
exit(0); // Parent process exits}}return 0;
}
Copy the code
Compile and run the above code
gcc fork_demo.c -o fork_demo; ./fork_demo
Copy the code
The output is as follows.
before fork, pid=21629
in parent process, pid=21629, child pid=21630
in child process, pid=21630
Copy the code
The id of the parent process is 21629, and the ID of the generated child process is 21630.
Run the ps command to view the current process information. The following information is displayed:
UID PID PPID C STIME TTY TIME CMD root 1 0 0 12月12? 00:00:53 /usr/lib/systemd/systemd --system --deserialize 21 ya 21630 1 0 19:26 pts/8 00:00:00 ./fork_demoCopy the code
You can see that the parent ID of the orphan child process 21630 has changed to the top-level process with ID 1.
zombies
The father process is responsible for the birth, and if he is not responsible for the maintenance, he is not a good father. The child dies, and if the parent does not “collect” the child (by calling wait/ waitPID), the child becomes a zombie.
Create a new make_zombie. C file with the following contents.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
printf("pid %d\n", getpid());
int child_pid = fork();
if (child_pid == 0) {
printf("-----in child process: %d\n", getpid());
exit(0);
} else {
sleep(1000000);
}
return 0;
}
Copy the code
Compile and run the above code to generate a zombie process with process number 22538, as shown below.
UID PID PPID C STIME TTY TIME CMD
ya 22537 20759 0 19:57 pts/8 00:00:00 ./make_zombie
ya 22538 22537 0 19:57 pts/8 00:00:00 [make_zombie] <defunct>
Copy the code
Defunct in the CMD name indicates that this is a zombie process.
You can also run the ps command to check the process status. If the status is “Z” or “Z+”, it indicates that the process is a zombie process, as shown in the following figure.
ps -ho pid,state -p 22538
22538 Z
Copy the code
After the child exits, most of the resources have been freed for further use, but slots in the kernel’s process table have not been freed.
Zombie processes have an amazing feature that you can’t kill them even with the kill -9 signal. This design has both advantages and disadvantages. The good part is that the parent process always has the opportunity to execute wait/waitpid commands to harvest the child process, but the bad part is that it cannot be forced to recycle the zombie process.
Process whose PID is 1
In Linux, after kernel initialization, the system starts the first process, whose PID is 1. It can also be called the init process or ROOT process. On my Centos machine, the init process is systemd, as shown below.
UID PID PPID C STIME TTY TIME CMD root 1 0 0 12月12? 00:00:54 /usr/lib/systemd/systemd --system --deserialize 21Copy the code
On my Mac, the process is Launchd, as shown below.
UID PID PPID C STIME TTY TIME CMD 0 1 0 0 6004 afternoon?? Anoint yourself. 65 / sbin/launchd is that itCopy the code
The init process has the following functions
- If a process’s parent exits, the init process takes over the orphan process.
- If the parent of a process exits without executing wait/ waitPID, the init process connects to the tube process and calls the wait method automatically to ensure that zombie processes can be removed from the system.
- Passing signals to child processes, as we’ll see later.
Why is Node.js not suitable for Docker image processes with PID 1
Node.js was not designed to run as PID 1 which leads to unexpected behaviour when running inside Of Docker. “. The image below is from github.com/nodejs/dock… .
Next two experiments will be done: the first experiment is on the Centos machine, and the second experiment is in the Docker image
Experiment 1: On Centos, Systemd is the process whose PID is 1
To do some tests, modify the above code, shorten the parent process sleep to 15s, and create a new make_zombie. C file, as shown below.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
printf("pid %d\n", getpid());
int child_pid = fork();
if (child_pid == 0) {
printf("-----in child process: %d\n", getpid());
exit(0);
} else {
sleep(15);
exit(0); }}Copy the code
Compile to generate the executable make_zombie.
gcc make_zombie.c -o make_zombie
Copy the code
Then create a new run.js code and internally start a process to run make_zombie, as shown below.
const { spawn } = require('child_process');
const cmd = spawn('./make_zombie');
cmd.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
cmd.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});
cmd.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
setTimeout(function () {
console.log("...");
}, 1000000);
Copy the code
Run the js code and use ps -ef to view the process relationship as follows.
UID PID PPID C STIME TTY TIME CMD ya 19234 19231 0 12月20? 00:00:00 sshd: ya@pts/6 ya 19235 19234 0 12月20 PTS /6 00:00:01 - ZSH ya 29513 19235 3 15:28 PTS /6 00:00:00 node run.js ya 29519 29513 0 15:28 pts/6 00:00:00 ./make_zombie ya 29520 29519 0 15:28 pts/6 00:00:00 [make_zombie] <defunct>Copy the code
After 15s, run ps -ef again to check the current running process, and you can see that make_zombie related processes are gone.
UID PID PPID C STIME TTY TIME CMD ya 19234 19231 0 12月20? 00:00:00 SSHD: ya@pts/6 ya 19235 19234 0 12月20 PTS /6 00:00:01 - ZSH ya 29513 19235 3 15:28 PTS /6 00:00:00 node run.jsCopy the code
This is because the make_zombie parent, whose PID is 29519, exits after 15s, and the zombie child is hosted in init, which calls wait/waitfor to collect the corpse.
Experiment 2: On Docker, Node acts as a process with PID 1
Package the make_zombie executable and run.js as a.tar.gz package and create a new Dockerfile with the following contents.
# specify the base image
FROM registry.gz.cctv.cn/library/your_node_image:your_tag
WORKDIR /
Copy the package file to the working directory
ADD test.tar.gz .
# specify the start command
CMD ["node"."run.js"]
Copy the code
Run the docker run command ab71925B5154 to start the docker Image. Run the docker run command ab71925b5154. Use Docker PS to find the image CONTAINER ID, which is e37f7e3C2E39. Then use Docker Exec to enter the mirror terminal
docker exec -it e37f7e3c2e39 /bin/bash
Copy the code
Run the ps command to check the current process status, as shown in the following figure.
UID PID PPID C STIME TTY TIME CMD
root 1 0 1 07:52 ? 00:00:00 node run.js
root 12 1 0 07:52 ? 00:00:00 ./make_zombie
root 13 12 0 07:52 ? 00:00:00 [make_zombie] <defunct>
Copy the code
After 15 seconds, run ps again to view the current process, as shown below.
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 07:52 ? 00:00:00 node run.js
root 13 1 0 07:52 ? 00:00:00 [make_zombie] <defunct>
Copy the code
You can see that the zombie process with PID 13 has been hosted by the node process with PID 1, but has not been reclaimed.
This is the main reason node is not a good init process: it cannot recycle zombie processes.
Speaking of Node, NPM actually uses the NPM process to start a child process that starts the startup script written in scripts in package.json, as shown below.
{
"name": "test-demo"."version": "1.0.0"."description": ""."main": "index.js"."scripts": {
"test": "echo \"Error: no test specified\" && exit 1"."start": "node run.js"
},
"keywords": []."author": ""."license": "ISC"."dependencies": {}}Copy the code
Start with NPM run start, and the resulting process is shown below.
Ya 19235 19234 0 12月20 PTS /6 00:00:01 - ZSH ya 32252 19235 0 16:32 PTS /6 00:00:00 NPM ya 32262 32252 0 16:32 PTS /6 00:00:00 node run.jsCopy the code
Like Node, NPM does not handle zombie subprocess recycling.
Online problem analysis
We use NPM start to launch a Puppeteer project in case of an online problem, creating 4 Chrome related processes for each image generated, as shown below.
. | └ ─ ─ chrome (1) ├ ─ ─ the gpu - process (2) └ ─ ─ zygote (3) └ ─ ─ the renderer (4)Copy the code
When the image is generated, the main Chrome process exits, and the remaining three orphan zombie processes are hosted in the top NPM process, but the NPM process cannot recycle, so three new zombie processes are created every time the image is generated. After thousands of images were generated, the system was full of zombie processes.
The solution
To solve this problem, instead of making Node/NPM an init process, make the service that has the ability to take over zombie processes an init process. There are two solutions.
- Start Node or NPM using bash
- Add a specialized init process, such as tini
Solution 1: Use bash to start the Node
It is a faster way to make bash a top-level process, which is responsible for reclaiming zombie processes and modifying dockerfiles, as shown below.
ADD test.tar.gz .
# CMD ["npm", "run", "start"]
CMD ["/bin/bash"."-c"."set -e && npm run start"]
Copy the code
Using this method is relatively easy, and there were no problems on the line before. It is because the node was started using this bash method in the first place, but one of the younger brothers changed the command to NPM run start in order to start the command.
However, using bash is not a perfect solution, and it has a serious problem. Bash does not signal the processes it starts, and features such as graceful downtime cannot be implemented.
To verify bash’s claim that it does not pass signals to child processes, create a new signal_test.c file that handles SIGQUIT, SIGTERM, and SIGTERM.
#include <signal.h>
#include <stdio.h>
static void signal_handler(int signal_no) {
if (signal_no == SIGQUIT) {
printf("quit signal receive: %d\n", signal_no);
} else if (signal_no == SIGTERM) {
printf("term signal receive: %d\n", signal_no);
} else if (signal_no == SIGTERM) {
printf("interrupt signal receive: %d\n", signal_no);
}
}
int main() {
printf("in main\n");
signal(SIGQUIT, signal_handler);
signal(SIGINT, signal_handler);
signal(SIGTERM, signal_handler);
getchar();
}
Copy the code
If you send kill -2, -3, -15 to the signal_test program on Centos and Mac, the program will print out the corresponding output, indicating that a signal has been received. As shown below.
kill -15 47120
term signal receive: 15
kill -3 47120
quit signal receive: 3
kill -2 47120
interrupt signal receive: 2
Copy the code
When starting the program with bash in a Docker image, bash does not pass the signal to signal_test after sending the kill command to bash. After docker stop is executed, Docker sends SIGTERM(15) to bash. Bash does not pass this signal to the started application. Docker will send kill -9 to forcibly kill the docker process, which cannot achieve the function of graceful shutdown.
This leads to the second solution.
Solution 2: Use a special init process
Node.js offers two alternatives. The first is to use docker’s official lightweight init system, as shown below.
docker run -it --init you_docker_image_id
Copy the code
This startup mode will use /sbin/docker-init as the init process with PID 1, and will not use CMD as the first startup process in Dockerfile.
Take the following Dockerfile content as an example
. CMD ["./signal_test"]...Copy the code
Run the docker run-it –init image_id command to start the Docker image. At this point, the process inside the image is shown as follows.
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 15:30 pts/0 00:00:00 /sbin/docker-init -- /app/node-default
root 6 1 0 15:30 pts/0 00:00:00 ./signal_test
Copy the code
You can see the signal_test program started as a child of docker-init.
After the docker stop command sends the SIGTERM signal to the image, the Docker-init process forwards the signal to Signal_test, and the application process receives the SIGTERM signal to perform custom processing, such as graceful shutdown, etc.
In addition to docker’s official solution, node.js best practices also recommend a very small init process written in C like Tini, github.com/krallin/tin… . The code is short, well worth reading, and very helpful for understanding signaling and dealing with zombie processes.
summary
In this article, I hope you can understand what zombie processes, orphan processes, PID 1 processes are, why node/ NPM is not suitable for PID 1 processes, and what are the drawbacks of bash as a PID 1 process.
Here is an assignment to test your understanding of the process fork function. How many new processes will result from three consecutive fork() calls?
#include <stdio.h>
#include <unistd.h>
int main() {
printf("Hello, World! \n");
fork();
fork();
fork();
sleep(100);
return 0;
}
Copy the code
If you have any questions, you can scan the following TWO-DIMENSIONAL code to follow my official number to contact me.