I thought I was quite familiar with redis command, all kinds of data models based on Redis SAO operation. However, recently I stepped on a pit in the command of Redis scan, and suddenly found that I had a very limited understanding of redis cursors.

So record the process of stomping the hole, in the background:

The company needs to delete some useless keys that have no expiration date because the Redis server is running out of memory. There’s about 500 million keys. Although the number of keys sounds scary. But I’ve been playing Redis for years, so it’s easy.

At that time, I thought about the specific scheme is to filter out 500W keys through Lua scripts. Then delete it. Lua scripts are executed on Redis Server with high execution speed and only need to establish a connection with Redis Server to execute a batch of luA scripts. Filter out the keys and delete 1W at a time. Then loop through the shell script 500 times to delete all of them. Lua script used to do similar batch update operations, 3W a second. Almost no redis blocking. That works out to a 500W key in 10 minutes.

Then I started writing lua scripts directly. The first is screening.

If you have used Redis, you must know that Redis is a single-threaded operation, and you cannot use keys command to filter, because keys command will search the whole thing at one time, which will cause redis to block, thus affecting the execution of normal business commands.

For keys with 500W data, only incremental iterations can be performed. Redis provides the scan command for incremental iteration. This command can return a small number of elements at a time, so it is ideal for iterating large data sets and can be used in production environments.

The scan command returns an array with the cursor position as the first item and the list of keys as the second item. If the cursor reaches the end, the first term returns 0.

So I wrote the first version of lua as follows:

local c = 0 local resp = redis.call('SCAN',c,'MATCH','authToken*','COUNT',10000) c = tonumber(resp[1]) local dataList = resp[2] for i=1,#dataList do local d = dataList[i] local ttl = redis.call('TTL',d) if ttl == -1 then redis.call('DEL',d)  end end if c==0 then return 'all finished' else return 'end' endCopy the code

In the local test Redis environment, mock 20W of test data by executing the following command:

eval "for i = 1, 200000 do redis.call('SET','authToken_' .. i,i) end" 0
Copy the code

Then execute script load command to upload lua script to get SHA value, then execute evalsha to execute SHA value to run. The specific process is as follows:

Every time I delete 1W of data, I perform dbsize (since this is my local Redis, which only contains mock data, dbsize is the same as the number of prefix keys).

Oddly enough, the first few lines are normal. However, when it came to the third time, the DBsize became 16999, and I deleted one more dbsize. I didn’t care too much about it, but at last, when there were 124204 DBsize remaining, the number did not change. No matter how many times it is executed, the number is still 124,204.

Then I run the scan command directly:

Find that the cursor has not reached the end, but the list of keys is empty.

I was stunned by the result for a while. I checked the Lua script carefully and there is no problem. Redis scan command has a bug? Is there something wrong with my understanding?

I’ll go back to the redis command documentation for an explanation of the count option:

After a detailed study, I found that the number of returns specified by the count option is not certain. Although I know that the problem may be count, but the explanation of the document is really difficult to understand very popularly, I still don’t know where the specific problem is.

Later, after a hint from a friend, I saw another popular explanation of the scan command count option:

After reading it, I suddenly realized. The number followed by the count option does not mean the number of elements returned each time, but the number of dictionary slots traversed by the scan command each time

When I perform scan, I traverse from the position of cursor 0 every time, but not every dictionary slot stores the data I need to filter, which leads to my last phenomenon: Although I counted 10000, redis actually ran through 10000 dictionary slots from the beginning and found no data slots to hold the data I needed. So my final number of DBsizes was stuck at 124,204 forever.

Therefore, when using the scan command, if iterative traversal is required, the cursor returned by the previous call should be used as the cursor parameter of each call to continue the previous iteration process.

At this point, the heart of the doubts solved, changed a version of Lua:

local c = tonumber(ARGV[1])
local resp = redis.call('SCAN',c,'MATCH','authToken*','COUNT',10000)
c = tonumber(resp[1])
local dataList = resp[2]

for i=1,#dataList do
    local d = dataList[i]
    local ttl = redis.call('TTL',d)
    if ttl == -1 then
        redis.call('DEL',d)
    end
end

return c
Copy the code

Execute after uploading locally:

As you can see, the scan command does not guarantee that the number of filters is exactly the same as the given count, but the entire iteration continues nicely. And then we get a cursor that returns 0, which is the end. So far, the 20W test data has been completely deleted.

This section of Lua can be run directly on production by looping through the shell. It is estimated that 500W data can be deleted in about 12 minutes.

Know what it is and why. Although the scan command has been played before. But I really don’t know the details. Besides, the translation of the document was not so accurate, so that I wasted nearly an hour in the face of the wrong result. Write it down and understand it better.