Background:

The platform deployed to the customer before the holiday was abnormal, and the result was that the project was stuck at 5% and could not be deployed normally. The reason for the exception is that the customer’s machine is on the AWS public cloud, and their policy is to delete the entire machine and rebuild it once the application becomes inaccessible, resulting in the loss of all the components deployed on it…

You need to look at the error message if you can’t start the related service after redeploying the related component.

  • Log in to the customer’s machine to check the service status
[root@10-251-180-180 ~]# docker ps -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
[root@10-251-180-180 system]# systemctl status builder.serviceLow earth - builder. Service - builder Container the Loaded: the Loaded (/ usr/lib/systemd/system/builder. The service; enabled; vendor preset: disabled) Active: failed (Result: start-limit) since Thu 2021-09-30 15:27:13 CST; 1 weeks 0 days ago Process: 57932 ExecStopPost=/bin/docker rm -f builder.service (code=exited, status=0/SUCCESS) Process: 57922 ExecStart=/bin/docker run --rm --privileged --net=host --name builder.service -e BUILDER_PROFILE=dev -v /var/run/docker.sock:/var/run/docker.sock -v /bin/docker:/bin/docker -v /data/server/builder/log:/data/server/builder/log-v /data/server/builder/conf/builder.cfg:/data/server/builder/conf/builder/dev/builder.cfg -v /data/server/builder/conf/certs/dev/ssl:/data/server/builder/conf/certs/dev/rabbitmq/ssl -v /data/server/builder/tmp/workspace:/data/server/builder/tmp/workspace -v /data/server/earth-service-template:/data/server/earth-service-template -v /etc/localtime:/etc/localtime Registry-poc.cnnol.uds-qa.lenovo.com/xcloud-product/earth-builder:1.0.35 (code = exited, status = 127) Process: 57914 ExecStartPre=/bin/docker rm -f builder.service (code=exited, status=0/SUCCESS) Process: 57904 ExecStartPre=/bin/docker stop builder.service (code=exited, status=1/FAILURE) Main PID: 57922 (code=exited, status=127) Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: Unit builder.service entered failed state. Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: builder.service failed. Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: builder.service holdoff time over, scheduling restart. Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: Stopped builder Container. Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: start request repeated too quicklyfor builder.service
Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: Failed to start builder Container.
Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: Unit builder.service entered failed state.
Sep 30 15:27:13 10-251-180-180.earth-paas systemd[1]: builder.service failed.
Copy the code

As a result, no containers are running. Then you have to look at the run log of the service

[root@10-251-180-180 system]# journalctl -f -u builder.service
-- Logs begin at Mon 2021-09-20 23:00:31 CST. --
Oct 08 10:24:42 10-251-180-180.earth-paas systemd[1]: Starting builder Container...
Oct 08 10:24:42 10-251-180-180.earth-paas docker[71609]: Error response from daemon: No such container: builder.service
Oct 08 10:24:42 10-251-180-180.earth-paas docker[71617]: Error: No such container: builder.service
Oct 08 10:24:42 10-251-180-180.earth-paas systemd[1]: Started builder Container.
Oct 08 10:24:42 10-251-180-180.earth-paas docker[71627]: Unable to find image 'XXXXX/builder: 1.0.35'Locally oc08 10:24:420-251-180-180. Earth-paas Docker [71627]: 1.0.35: Pulling from xxxxx/builder Oct 08 10:24:42 10-251-180-180.earth-paas docker[71627]: 534e72e7cedc: Pulling fs layer Oct 08 10:24:43 10-251-180-180.earth-paas docker[71627]: docker: open /data/docker/tmp/GetImageBlob205343402: no such file or directory. Oct 08 10:24:43 10-251-180-180.earth-paas docker[71627]: See'docker run --help'.
Oct 08 10:24:43 10-251-180-180.earth-paas systemd[1]: builder.service: main process exited, code=exited, status=127/n/a
[root@10-251-180-180 system]# docker pull registry-poc.cnnol.uds-qa.lenovo.com/xcloud-product/earth-builder:1.0.35
1.0.35: Pulling from xcloud-product/earth-builder
534e72e7cedc: Pulling fs layer
924d479f8494: Pulling fs layer
530c8d5bb194: Pulling fs layer
25f403377f83: Pulling fs layer
fefb0ce67cb3: Waiting
open /data/docker/tmp/GetImageBlob691109094: no such file or directory
Copy the code

Problem has been positioning to pull mirror tip open/data/docker/TMP/GetImageBlob691109094: No such file or directory then let’s see if the running status of docker is normal, and finally found that docker is also normal.

[root@10-251-180-180 system]# docker info
Client:
 Context:    default
 Debug Mode: falsePlugins: app: Docker app (Docker Inc., v0.9.1-beta3) buildx: Build with BuildKit (Docker Inc., v0.6.1- Docker) Scan: Docker Scan (Docker Inc., V0.8.0) Server: 0 Running: 0 Paused: 0 Stopped: 0 Images: 0 Server Version: 19.03.15 Storage Driver: Overlay2 Backing Filesystem: XFS Supports D_type:true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file locallogentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: Docker - init containerd version: e25210fe30a0a703442421b0f60afac609f950a3 runc version: v1.0.1-0 - g4144b63 init version: Fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1160.31.1.el7.x86_64 Operating System: fec3683 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1160.31.1.el7.x86_64 Operating System: CentOS Linux 7 (Core) OSType: Linux Architecture: x86_64 CPUs: 4 Total Memory: 15.22GiB Name: 10-251-180-180.earth-paas ID: CXRK:PGI5:L7UF:5ZPB:42AV:34PJ:HCV7:NNCH:UNPQ:5T2Q:HW4Y:QLOM Docker Root Dir: /data/docker Debug Mode:false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
Copy the code

Then we delete the docker data, but it still can’t start

[root@10-251-180-180 system]# rm -rf /data/docker/*
Copy the code

View the details about the Docker data disk

[root@10-251-180-180 system]# # the mount - n | grep data (there are two hardpoints what is this operation)
/dev/nvme0n1p1 on /data type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/nvme1n1 on /data type ext3 (rw,relatime,seclabel,data=ordered)
[root@10-251-180-180 system]# df -hFilesystem Size Used Avail Use% Mounted on devtmpfs 7.6g 0 7.6g 0% /dev TMPFS 7.7g 0 7.7g 0% /dev/ SHM TMPFS 7.7g 25M 7.1g 1% /run TMPFS 7.1g 0 7.1g 0% /sys/fs/cgroup /dev/nvme0n1p1 20G 2.4g 18G 12% / /nvme1n1 197G 61M 187G 1% /data TMPFS 1.6g 0 1.6g 0% /run/user/1001Copy the code

Unmount /dev/nvme0n1p1 and restart docker