Redis blew up. I managed to dump the pot on the next team

The project won’t get up!

. The project won’t get up again!

. 叒 yi !!!!!!!!!

Last week, I heard a lot of colleagues say that the project has lost 叒 yi, and Redis can’t connect to it. I am busy developing new requirements in another normal environment, so I don’t have time to worry about it.

So, after looking at the log, the number of connections is too many… Emmm, I added 0 to the upper limit of connection number in Redis configuration for my colleague, troubleshooting and so on.

ERR max number of clients reachedCopy the code

Finally… The total can not escape, the author with the environment was also done collapse, and is inexplicably cold.

But… Blow stine!

After all, modifying the upper limit of the number of connections is a palliative rather than a cure. Originally, the upper limit of the number of connections is 10000, and there are only dozens of micro-services in total.

Someone must have left the connection open. Look into him.

Known:

After the Redis service restarts, it takes a while for the connection to be full again. There are dozens of services in total. The configured number of 10,000 connections is unlikely to be used up under normal circumstances.

The first step is to restart and preempt a connection

After rebooting Redis, I immediately connected to Redis and checked the number of clients.

$ docker exec -it $(docker ps | grep redis | awk '{print $1}') redis-cli -a {pwd} 127.0.0.1:6379 > info...# Clientsconnected_clients:391... Copy the codeCopy the code

tips:

The info command displays various information and statistics about the Redis server.

Step 2 Record all clients

Check again a few minutes later:

127.0.0.1:6379 > info...# Clientsconnected_clients:10002... Copy the codeCopy the code

At this point, the connection is fully occupied.

Save all client info in a file and get ready to catch this guy who’s doing this.

127.0.0.1:6379> Client listid=7863 addr=172.18.0.104:56836 fd=6150 Name = age=72 IDLE =72 FLAGS =N DB =0 sub=0 Psub =0 Multi =-1 qbuf=0 qbuf-free=0 obl=0 OLl =0 OMem =0 events=r CMD =pingid=7864 addr= 172.18.0.50/56262 fd=6151 name= age=72 idle=72 flags=N db=9 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=pingid=7865 Addr =172.18.0.104:56840 fd=6152 Name = age=72 IDLE =72 FLAGS =N DB =0 sub=0 psub=0 multi=-1 qBUf =0 qBUf free=0 OBL =0 OLL =0 omem=0 events=r cmd=ping...Copy the code

tips:

Client list: lists all client information.

Step 3 Find the abnormal IP address

With all the client connection information, you can find out whose pot it is. Run the following command to output the top five IP addresses:

$ cat client-list | awk '{print $2}' | awk -F "[= :]." '{print $2}'| sort | uniq -c | sort - k1, 1 nr | head - 55432 172.18.0.504244 172.18.0.104 43 172.18.0.59 172.18.0.54 32 172.18.0.55 40Copy the code

So far, 172.18.0.50 and 172.18.0.104 are locked. These are both addresses for the Docker internal network.

tips:

Awk ‘{print $2}: print IP. Addr =172.18.0.104:56836.

Awk -f “[=:]” ‘{print $2}’: split addr=172.18.0.104:56836 with colons

Sort: sorting.

Uniq -c: Counts the quantity and displays the number of repeats of the row next to each column.

Step 4 Locate the service and throw the pot

Get the IP is not far from the target, through docker inspect can output docker instance information, including IP.

$ docker inspect --format='{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'$(docker ps - aq) | grep 172.18.0.50 / docker_xxxxx - service - 172.18.0.50 $docker inspect - format ='{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}'$(docker ps - aq) | grep 172.18.0.104 / docker_yyyyy - service - 172.18.0.104Copy the code

tips:

Docker inspect: fetch metadata of container/image. –format: format output with template.

XXXXX and YYyyy, MMMM… Service for the Py (thon) group next door. Here, here’s the pot.

If this article is helpful to you, please help to forward it (~ ▽ ~)”

Redis blew up. I managed to dump the pot on the next team

The first step is to restart and preempt a connection

Step 2 Record all clients

Step 3 Find the abnormal IP address

Step 4 Locate the service and throw the pot

Related Posts

Query the SQL execution process

Use Kafka(Golang code)

The use of the Flask framework