This post was posted on my personal website Jamki’s personal website
causes
One morning I was on the bus checking my phone on the way to work, and suddenly there was an uproar in the department group, “Stop not open? ! “” probability page barrier” “how it happened”…. Also sleepy I immediately scared the mobile phone almost fell to the ground, anxious I directly want to take out the computer on the bus, in fear of ten minutes later, I quickly off the bus rushed back to the company’s own station, began to investigate.
Location problem
I opened the console of Ali Cloud and looked at the errors for nearly half an hour. They were basically the same:
2020-12-10 08:50:19,868 ERROR 168244 [-/127.0.0.1/-/195ms POST /user/site/list] nodejs.ReplyError OOM command not allowed when used memory > 'maxmemory'.Copy the code
What you mean? Redis overflows! The current redis memory exceeds its maximum memory capacity.
Get on the server and start working
Redis memory information:
redis-cli info memory
Copy the code
Boy, the maximum capacity is only 4.66G, and 4.65g is already used, which is where the arrows are
At that time the situation was anxious not screenshots, here put out just check redis memory usage, memory situation is not at that time.
Now the real hammer redis has overflowed. Let’s see what’s taking up so much memory. Let’s see what’s taking up so much memory
redis-cli --bigkeys
Copy the code
The output looks like this (the data is not current)
Redis -cli -bigkeys output will show the largest memory usage of the string key, set key, hash key, how many bytes, and other information. However, the largest key is no more than a few tens of megabytes, so it cannot be caused by a single key. It is possible that something caused the creation of millions of keys to fill up Redis memory.
The output summary confirms my suspicions:
-------- summary -------
Sampled 11998929 keys in the keyspace!
Total key length in bytes is 430089742 (avg len 35.84)
Biggest string found 'xxxx' has 789970 bytes
Biggest set found 'xxxx' has 20841 members
Biggest hash found 'xxxx' has 3393675 fields
Copy the code
There are over 10 million records in Redis! I gasped. How can you do that? More than 10 million…
The screening process
In order to determine the form of the keys of the more than 10 million records, I saved all the keys in the text. On the one hand, I thought that the data would be frequently used in the analysis process, and frequent large-scale data retrieval would affect the normal redis reading. On the other hand, IT was also to save “evidence” and facilitate the analysis of the cause of the failure
Save records to text:
redis-cli keys "*" > /data/redis-key.log
Copy the code
Note that keys * is not recommended in production because it can be very costly, especially in large redis, and can sometimes get stuck for a while, affecting other redis. But this is an unusual time, the failure has appeared, it is understandable to use.
Ok, now that I have all the keys, I want to see what kind of keys fill the memory. First, based on all the keys that appear in my program for redis writes, I look at the number of lines that appear in the key text. For example, I look at how many lines of the key form start with sitemap_
cat /data//redis-key.log | grep 'sitemap_' | wc -l
Copy the code
It turns out that the largest type of key I queried was only in the tens of thousands
What else?
So I’m gonna randomly pick up 100,000 keys and see what it looks like
shuf -n 100000 /data/redis-key.log > /data/redis-ramdom-key-100000.log
Copy the code
I pulled 100,000 random entries, and let’s see what they look like
less /data/redis-ramdom-key-100000.log
Copy the code
(By the way, in Linux, there are several common commands to view logs or text, such as cat, less, tail, etc. Cat will display the entire text, suitable for output small text; Less is a small part of the display, scrolling loading, suitable for large text view; Tail is usually used to output logs in real time, or to output the last few lines of logs or text. You should know which commands to use when viewing text.)
Look at the output content, all is this format:
f30a0485-b59f-4939-a41d-3955786b37e0
Copy the code
This is the session ID created by egg-session.
I remember that the session expiration time was set to two days. Does the session expiration time set not take effect, resulting in the overflow of session records in Redis? To test my idea, I had to see if any of the existing sessions had lasted longer than two days
Redis-cli TTL is used to check the TTL of the key. For details, you can refer to the document and choose one from the list of session ids to see if the TTL query result exceeds two days (the unit of TTL query result is in milliseconds, two days is 172800000 milliseconds).
redis-cli ttl 'f30a0485-b59f-4939-a41d-3955786b37e0'
Copy the code
Turns out it wasn’t more than two days,
Then I took the random 100,000 and looked it up:
cat /data/redis-ramdom-key-100000.log | xargs -I key redis-cli ttl key > /data/key_ttl.log
Copy the code
Xrags takes each input as a parameter, records it as a key, and then presents it to redis-CLI TTL for execution. Save the time to key_ttl.log and add that Xrags is a very powerful and useful state, and if you want your shell statements to be flowy, you deserve it.
Now let’s see if there are any longer than two days and print them directly:
cat /data/key_ttl.log | awk '{if($0 > 172800) print $0}'
Copy the code
Filter the 100,000 records you just got and print them out if they’re more than two days old
But to my puzzlement, there was no output, proving that the survival time was less than two days! oh no
Since the expiration time is ok, what’s wrong with the creation? I just took a random session ID key and printed it out
redis-cli get 'f30a0485-b59f-4939-a41d-3955786b37e0'
Copy the code
I will not show the specific structure, but I got an important information, there is no session information and business information in this session, only a few optional values; A session should not contain login information, user information, etc. The user information is empty, this can also create a session. Can you create a session without logging in? To test my idea, I tried it out:
I opened my local project and cleared the redis related records beforehand. I opened the page and didn’t log in. I immediately went back to see the Redis information and oh no it really created a session for me… Now I know exactly what’s going on. Someone must have been using our site, causing conversations to be created.
Hindsight, I opened the nginx access log,
tail -f xxxxx.log
Copy the code
Sure enough, all the requests, gushing, I think, should not ah, is not the program to IP to limit the flow, how can brush over? I took a few days from the nginx log more similar to immediately out, with its IP to check the middle layer of the flow limiting record, each IP only recorded 3 words, that is to say, the brush people have a very large IP pool, avoid blocking us, each IP only requested 3 times immediately change. I thought to myself, this person fierce ah, so big IP pool, really enough inferior this
Now there is a doubt, is why not log in will give me to create a session, let those who brush the amount of people have a chance, I have to change it! It’s too dangerous to do that. !
Explore solutions
Egg-seesion -redis automatically creates a session as soon as the request comes in without the session key and there is data to write (whatever it is, even if it is empty), but this is not what I want. What I want is to create a session after logging in. After reading the source code of egg-seesion, egg-session-redis, and koa-session, I found the place that can be operated:
This is where the egg-session-redis plugin creates the session for me, so I just need to make a judgment call before redis.set
There are two ways to do what I want: 1. You can create app.js in the root directory of egg.js, which contains several lifecycle functions, including a didLoad function. When all the configuration files have been loaded, I just overwrite the app.seeionStore
Given the time and complexity of the operation, I have no doubt chosen the second option
The end
Finally, the fault is solved smoothly, redis memory is reduced and maintained at the normal level; This troubleshooting also learned several interesting and practical liUX commands, such as XAGrs, SHUf, awK, etc., skilled use of great benefit to us. Ok, finally, I hope we can all make progress together. I’m Jamki