Hello everyone, I am Glacier ~~

It is estimated that the year before the worship server is not working, the year after the server is always more or less the point of problem. I don’t know if it’s the people or the feng shui. When I got off work yesterday, I explained several times with operation and maintenance partners: if you use Docker to install Kafka cluster, you also need to allocate a large number of Kafka cluster server hard disks, the company’s business volume is very large, many services of communication, data flow, log collection transmission, etc., are carried out through the Kafka message bus.

Unexpectedly, as soon as I arrived at the company this morning, I just got to work. When I opened my computer, I received a large number of server alarm information in the email. Then I saw that several test servers on the Intranet hung up on the large monitoring screen. At this point, my face is like this.

Holy shit. What’s going on? You just got here and you’re in trouble? Which servers are down? Then take a look at the big screen, I go, this is not yesterday with the operation partner said that a few Kafka cluster server?

Just failed the test? Can’t be that bad?

So, I hurriedly walked to the operation and maintenance partner next to, said: you yesterday how to configure the server ah?

He said: I did not configure ah? Isn’t it a test environment? I did not configure it very much, I gave 120GB space per server and installed Kafka cluster with default Settings.

I: not told you to let you set the server disk space of a few big? .

Heart again how speechless, also want to solve the problem! So I quickly logged in to the server and ran commands on the server command line to switch the current directory of the server terminal to the default directory of the Docker image.

[root@localhost ~]# cd /var/lib/docker
Copy the code

The result is an error, as shown below.

/ root @ localhost ~ # ls - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: Unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: Unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: unable to create temporary file for document immediately: device does not have the space - bash: Cannot create temporary files for immediate documents: there is no space on the device. -bash: Cannot create temporary files for immediate documents: there is no space on the deviceCopy the code

Unable to switch directories. Do how? I automatically looked at the server disk, and it was clear.

[root@localhost ~]# df -lh File system capacity Used Available Used % Mount point devtmpfs 3.8g 0 3.8g 0% /dev/tmpfs 3.9g 0 3.9g 0% /dev/shm TMPFS 3.9g 82M 3.8g 3% /run TMPFS 3.9g 0 3.9g 0% /sys/fs/cgroup /dev/mapper/localhost-root 50G 50G 0G 100% /dev/sda1 976M 144M 766M 16% /boot /dev/mapper/localhost-home 53G 5G 48G 91% /home tmpfs 779M 0 779M 0% /run/user/0 overlay 50G 50G 0G 100% /var/lib/docker/overlay2/d51b7c0afcc29c49b8b322d1822a961e6a86401f0c6d1c29c42033efe8e9f070/merged overlay 50G 50G 0G 100%  /var/lib/docker/overlay2/0e52ccd3ee566cc16ce4568eda40d0364049e804c36328bcfb5fdb92339724d5/merged overlay 50G 50G 0G 100% /var/lib/docker/overlay2/16fb25124e9b85c7c91f271887d9ae578bf8df058ecdfece24297967075cf829/mergedCopy the code

Damn it, the root disk usage is 100%, which is exactly what I thought. And in the output information, several important information is displayed, as shown below.

overlay 50G 50G 0G 100% /var/lib/docker/overlay2/d51b7c0afcc29c49b8b322d1822a961e6a86401f0c6d1c29c42033efe8e9f070/merged  overlay 50G 50G 0G 100% /var/lib/docker/overlay2/0e52ccd3ee566cc16ce4568eda40d0364049e804c36328bcfb5fdb92339724d5/merged overlay 50G 50G 0G 100%  /var/lib/docker/overlay2/16fb25124e9b85c7c91f271887d9ae578bf8df058ecdfece24297967075cf829/mergedCopy the code

Isn’t that the default Docker image?

What’s the next step? We can see that the /home directory is still relatively free, we can move the default Docker image directory from /var/lib/docker directory to /home/docker, to temporarily relieve the pressure on the server, for testing. The rest will be switched when the servers are reassigned.

Immediately, I started migrating the Docker default image directory.

Migration Docker default image directory, there are two plans, here with friends say, one is: soft link method; Another solution is to modify the configuration method. Let’s take a look at each of these approaches.

1. Soft linking

(1) By default, Docker is stored in /var/lib/docker. We can view the default installation directory of Docker image by using the following command.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /var/lib/docker
Copy the code

(2) Next, we run the following command to stop the Docker server.

systemctl stop docker
Copy the code

or

service docker stop
Copy the code

(3) Then move the /var/lib/docker directory to the /home directory.

mv /var/lib/docker /home
Copy the code

This process may take a long time.

(4) Next, create the soft link again, as shown below.

ln -s /home/docker /var/lib/docker
Copy the code

(5) Finally, we start the Docker server.

systemctl start docker
Copy the code

or

service docker start
Copy the code

(6) Check the Docker image directory again, as shown below.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /home/docker
Copy the code

At this point, the Docker image directory is successfully migrated.

Next, let’s talk about modifying configuration.

2. Modify the configuration method

Graph =/var/lib/docker. We only need to modify the configuration file to specify the startup parameters.

In this case, THE server OPERATING system I am using is CentOS. Therefore, you can modify the configuration of Docker in the following ways.

(1) Stop Docker service

systemctl stop docker
Copy the code

or

service docker stop
Copy the code

(2) Modify the docker service startup file.

vim /etc/systemd/system/multi-user.target.wants/docker.service
Copy the code

Add the following line to the startup file.

ExecStart=/usr/bin/dockerd --graph=/home/docker
Copy the code

(3) Reload the configuration and start

systemctl daemon-reload
systemctl start docker
Copy the code

(4) Check the Docker image directory again, as shown below.

[root@localhost ~]# docker info | grep "Docker Root Dir"
Docker Root Dir: /home/docker
Copy the code

At this point, the Docker image directory is successfully migrated.

The Kafka cluster is ready to be used temporarily to get the data running. So I reassigned the servers, set up the Kafka cluster, and moved the test environment to the new Kafka cluster at noon. It’s still being tested…

Did you guys get it?

PS: The server OPERATING system version I am using is as follows.

[root@localhost ~]# cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core) 
Copy the code

The Docker version used is as follows.

[root@localhost ~]# docker info Client: Debug Mode: false Server: Containers: 4 Running: 3 Paused: 0 Stopped: 1 Images: 33 Server Version: 19.03.8 # # # # # # # # # # # # other output information just # # # # # # # # # # # #Copy the code

Finally, why do I need to set the Kafka cluster server hard disk to be larger in the first place?

Because the flow of our production environment is relatively large, usually at 50,000 ~ 80,000 QPS, if encountered peak, will be much larger than these flows. At the time, I was diverting some of the traffic from the production environment to the test environment. If the disks in the Kafka cluster are not set to large, Kafka takes up a lot of disk space when the consumer performance of Kafka deteriorates or for other reasons causes messages to pile up in Kafka. If the disk space is full, the server Kafka is on will crash and go down.

Well, that’s all for today, I’m Glacier, and I’ll see you next time