Redis series of articles by Zxiaofan focuses on the basic and advanced applications of Redis. This is the first article in the Redis series [15]. Please visit the public account “Zxiaofan” or search for “Redis zxiaofan” on Baidu.

Key words: play Redis, export keys without expiration time, delete keys without expiration time;

Redis: How to Import, Export, and Delete Large Amounts of Data in a Production Environment

The outline

  • How do I query the number of keys in Redis that have no expiration date
  • Export keys from Redis that have no expiration time set
  • Safely delete keys from Redis that do not have an expiration date

Some time ago, the company had a new business that needed to use Redis, so it checked the usage of the production-one Redis cluster to evaluate whether it could directly access the new business. This Redis cluster bought Ali Cloud cluster community edition, 8 nodes 32G;

Don’t see don’t know, a look startled. There are 450W keys in the Redis instance, 230W of which are set to expire, which means 220W keys are not set to expire. What !!! Nearly 50% of the data has no expiration time, which is totally irrational and a serious waste.

Don’t pull out these nail households, “swear not to be an ape”.

At the end of this article, github link has been put up.

1. How to query the number of keys in Redis whose expiration time is not set

1.1 Use Aliyun [Redis version management Console of Cloud database] to check

If the Redis cluster is provided by the cloud vendor, the vendor will typically provide an administrative console. Alibaba Cloud console search “cloud database Redis version” to enter the [cloud database Redis version management console].

1.1.1. View Key changes of clusters or nodes

  • Enter Performance Monitoring and select Data Node Aggregation Indicator by default. Here is the interval broken line graph.
    • You can view data in the entire cluster by aggregation indicator.
    • You can adjust query time to expand the range of data analysis.
    • You can also select “Data Node” to view the data of the specified node.
  • Enter CloudDBA, enter Performance Trend, and view the range line chart
    • [Keys] column, you can view the total number of Keys, number of Keys whose expiration time is set, and number of expired Keys.

1.1.2. View the real-time performance of the Global Node

  • Enter “CloudDBA”, enter “Real-time Performance”, select “Real-time Performance of global Nodes”;
  • Automatic Refresh (5s) is selected by default to view the real-time performance of each node.

1.1.3 Description of cloud database Redis monitoring indicators

Monitoring indicators unit instructions
Keys Counts Total number of keys, the total number of level 1 keys stored by the instance
Expires Counts In this example, the number of key-value pairs with expiration time is set. This indicator shows the instantaneous value when data is collected
ExpiredKeys Counts Historical total number of keys that have been eliminated
EvictedKeys Counts The cumulative number of keys expelled in history
ExpiredKeysPerSecond Counts/s The number of keys eliminated per second
EvictedKeysPerSecond Counts/s The number of keys expelled per second

Reference address: help.aliyun.com/document_de… .

1.2. Use Redis command line to view

Execute the “Info Keyspace” command, where keys represents the total number of keys in the cluster (approximately 455W) and Expires represents the total number of keys with an expiration date (approximately 240W).

127.0.0.1:6379> info Keyspace # Keyspace DB0 :keys=4551001, Expires =2405155, AVg_TTL =219009799007Copy the code

The disadvantage of using the command line to view data is that it is impossible to check the changes of data during a period of time (of course, operation and maintenance can build their own monitoring system, and Prometheus and ELK are good choices), so cloud manufacturers provide us with great convenience.

2. Export keys whose expiration time is not set in Redis

It was painful to think that up to 50% of the resources were wasted, So had to export the holdouts, analyze the culprit, and kill him.

I originally thought that Ali Cloud should have a similar function, but after checking the official website and inquiring the technical support of Ali Cloud, I found that Ali Cloud did not provide this function, so I had to make up my own.

Because it is the production environment, so [safety, stability] is particularly important, must not affect the normal use of the production environment. After searching relevant information online, the following two script solutions are finally formed:

  • Export data using shell scripts;
    • No long connection, relatively slow, a large number of connections have a certain impact on Redis;
    • If the data volume is large, configure the sleep time properly.
  • Export data using Python scripts.

2.1. Use shell scripts to export data

You can configure some script parameters and modify the following parameters as required:

  • Db_ip: Redis connection address;
  • Db_port: Redis connection port that can be passed in as a parameter.
  • Password: Redis connection password, which can be passed in as a parameter.
  • Cursor: Indicates the cursor of the first scan iteration. The default value is 0.
  • CNT: Number of scan iterations. The default value is 1000, which can be adjusted and optimized according to the production situation.

Note that:

  • By default, the script sleeps for 0.1 seconds for every 1000 keys in a SCAN (TTL).
  • Shell scripts cannot maintain long connections, so each TTL creates a connection, which affects performance.
  • Lua script batch TTL to improve performance, interested students can implement it, welcome to share feedback.

Script execution:

A total of 30W keys are tested for the stand-alone Redis version. The test takes 12 minutes according to the default hibernation frequency (hibernation for 0.1 seconds for each scan key). The scanned keys are saved in no_ttlkey.log in the current directory.

[redis@xxx redis]$ time ./checknottl.sh 6378 password
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
scan_num: 0

real	12m14.810s
user	4m14.343s
sys	5m41.646s
Copy the code

Save the following script content as checknottl.sh, and run chmod u+x *. Sh to grant the execution permission. :

#! /bin/bash
Select * from key where expiration time is not set;
# script address: https://github.com/zxiaofan/OpenSource_Study/tree/master/redis_scripts;
Redis-cli/redis-cli/redis-cli
# checknottl.sh: By default, sleep is 0.1 seconds for each 1000 keys in scan(TTL).
# script reference network, has to do some adjustment optimization, more Redis series might visit: https://blog.csdn.net/u010887744/category_9356949.html;
# Note: 
Shell scripts cannot maintain long connections, so each TTL will create a connection, which has a performance impact.
# Have an idea to use lua script to batch TTL to improve performance, interested students can implement it, welcome to share feedback.Db_ip = 127.0.0.1# Redis ip
db_port=The $1         # Redis port
password=$2     Password # Redis
cursor=0             # first cursor
cnt=1000              # Number of iterations
new_cursor=0         # next cursor
scan_num=0         # Number of scanned keys

./redis-cli -h $db_ip -p $db_port -a $password scan $cursor count $cnt > scan_tmp_result
new_cursor=`sed -n '1p' scan_tmp_result`             Get the next cursor
sed -n '2,$p' scan_tmp_result > scan_result          # get keys

cat scan_result |while read line                     Loop through all keys

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
do
    ttl_result=`./redis-cli -h $db_ip -p $db_port -a $password ttl $line 2>/dev/null`      Use the TTL command to obtain the key expiration time
    # $scan_num +=1;
    if [[ $ttl_result= = 1]];then                  -1 indicates no expiration date
        echo $line >> no_ttlkey.log                     Append to the specified file
    fi
done

echo 'scan_num: '$scan_num
while [ $cursor -ne $new_cursor ]    If the cursor is not 0, then we have not iterated through all keys
do
    ./redis-cli -h $db_ip -p $db_port -a $password scan $new_cursor count $cnt 2>/dev/null> scan_tmp_result
    new_cursor=`sed -n '1p' scan_tmp_result`
    sed -n '2,$p' scan_tmp_result > scan_result
    cat scan_result |while read line
    do
        ttl_result=`./redis-cli -h $db_ip -p $db_port -a $password ttl $line 2>/dev/null`
        # $scan_num +=1;
        if [[ $ttl_result= = 1]];then
            echo $line >> no_ttlkey.log
        fi
        
        #if [ $scan_num % 1000 == 0 ]; then
        #	sleep 0.5
        #fi
    doneSleep 0.1done
rm -f scan_tmp_result
rm -f scan_result
Copy the code

2.2. Use Python scripts to export data

Why use Python? As mentioned above, the shell cannot maintain long connections. In the case of large Redis data, a large TTL will create a large number of connections, which will affect performance and stability of Redis.

Some parameters of the script can be passed in through instructions, and the script can be modified as needed:

  • -host: Redis connection address. The default value is 127.0.0.1.
  • -p: indicates the Redis connection port.
  • -d: indicates the DB to be scanned. By default, only library 0 is scanned.
  • -a: indicates the Redis connection password.
  • -sn: the sacN will sleep for 1 second after the number of keys is specified.

Script execution:

A total of 30W keys were tested for the stand-alone Redis version, which took 4 minutes in total according to the default sleep frequency (scan 1K keys sleep 0.5 seconds) (self-optimized and adjusted). The key of the scan result is saved in {port}_{db} _NO_TTl_keys. TXT of the current directory.

As you can see, although the Python version has a shorter sleep time, performance is much better. You can also use other languages that maintain connection pools.

[redis@xxx redis]$python checknottl.py -p 6378 -d 0 -a password there are 300005 keys in db[0] startTime of db[0] is: The 2020-12-06 10:40:09 [> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >] 100.00% done endTime of db [0] is: 2020-12-06 10:44:24 It takes 254.684270859 seconds: no TTL keys number is: 300002 The file of keys with no TTL: ./6378_0_no_ttl_keys.txtCopy the code

Save the following script contents to checknottl.py and run the above command after installing Python.

# encoding: utf-8
""" modify: zxiaofan date: 2020-12-05 func: Find not TTL Redis key scripting reference network (auther Yang Qi dragon), has to do some adjustment optimization, more Redis series can go to: https://blog.csdn.net/u010887744/category_9356949.html; Script address: https://github.com/zxiaofan/OpenSource_Study/tree/master/redis_scripts; Note: If ImportError: No module named Redis is not found, run python -m PIP install Redis. By default, the sleep is 0.5 seconds for every 1000 keys scanned. "" "
import redis
import argparse
import time
import sys


class ShowProcess:
    The class that displays the processing progress can display the processing progress by calling the related functions of the class.
    i = 0 # Current processing progress
    max_steps = 0 # Total number of processes required
    max_arrow = 50 The length of the progress bar

    To initialize the function, we need to know the total number of processing times
    def __init__(self, max_steps) :
        self.max_steps = max_steps
        self.i = 0

    # display function displays progress according to the current processing progress I
    # effect for [> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >] 100.00%
    def show_process(self, i = None) :
        if i is not None:
            self.i = i
        else:
            self.i += 1
        num_arrow = int(self.i * self.max_arrow / self.max_steps) # count how many '>' to display
        num_line = self.max_arrow - num_arrow # calculate how many '-' to display
        percent = self.i * 100.0 / self.max_steps # Calculate the completion progress in xx. Xx % format
        process_bar = '[' + '>' * num_arrow + ' ' * num_line + '] '\
                      + '%.2f' % percent + The '%' + '\r' # string with output, '\r' means return to leftmost without newline
        sys.stdout.write(process_bar) # These two sentences print characters to the terminal
        sys.stdout.flush()

    def close(self, words='done') :
        print ' '
        print words
        self.i = 0


def check_ttl(redis_conn, no_ttl_file, dbindex, scannum_thensleep) :
    start_time = time.time()
    no_ttl_num = 0
    scan_num = 0
    keys_num = redis_conn.dbsize()
    print "there are {num} keys in db[{index}] ".format(num=keys_num, index=dbindex)
    Print the start time of db scan
    print "startTime of db[{index}] is :{start_time}".format(index=dbindex, start_time=time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
    process_bar = ShowProcess(keys_num)
    with open(no_ttl_file, 'a') as f:

        for key in redis_conn.scan_iter(count=1000):
            process_bar.show_process()
            if redis_conn.ttl(key) == -1:
                no_ttl_num += 1
                if no_ttl_num < 1000:
                    f.write(key+'\n')
            else:
                continue
            
            scan_num +=1;
            if(scan_num % scannum_thensleep == 0) :# scan specifies the amount of sleep
                time.sleep(0.5);

    process_bar.close()
    Print the end time of sweep DB stroke
    print EndTime of db[{index}] is: {end_time}".format(index=dbindex, end_time=time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
    print "It takes {sec} seconds :".format(sec=(time.time() - start_time))
    print "no ttl keys number is :", no_ttl_num
    print "the file of keys with no ttl: %s" % no_ttl_file


def main() :
    parser = argparse.ArgumentParser()
    parser.add_argument('-p'.type=int, dest='port', action='store'.help='port of redis ')
    parser.add_argument('-d'.type=str, dest='db_list', action='store', default=0.help='ex: -d all or -d 1,2,3,4 ')
    parser.add_argument('-host'.type=str, dest='host', action='store', default='127.0.0.1'.help=' Redis host') # 20201205, support connection to remote Redis
    parser.add_argument('-a'.type=str, dest='password', action='store', default= None.help=' Redis Password') # 20201205, support passing in Redis password
    parser.add_argument('-sn'.type=str, dest='scannum_thensleep', action='store', default=2000.help='After the number of scanning keys reaches [scannum_thensleep], sleep for 1 second ') # 20201205, support sacN specified number of keys after sleep for 1 second
    args = parser.parse_args()
    port = args.port
    host = args.host
    password = args.password
    scannum_thensleep = int(args.scannum_thensleep)
    
    if args.db_list == 'all':
        db_list = [i for i in xrange(0.16)]
    else:
        db_list = [int(i) for i in args.db_list.split(', ')]
	db_list = list(set(db_list)) # 20201205, remove the weight, avoid users repeatedly input db serial number;
    
    for index in db_list:
        try:
            pool = redis.ConnectionPool(host=host, port=port, db=index, password=password) # 20201205, support passing in Redis password
            r = redis.StrictRedis(connection_pool=pool)
        except redis.exceptions.ConnectionError as e:
            print e
        else:
            no_ttl_keys_file = "./{port}_{db}_no_ttl_keys.txt".format(port=port, db=index)
            check_ttl(r, no_ttl_keys_file, index, scannum_thensleep)


if __name__ == '__main__':
    main()
Copy the code

3. Safely delete keys in Redis that do not have an expiration date

We can already get a key without an expiration date using the two methods mentioned earlier. The next thing we need to do is analyze the data, figure out which keys are the holdouts, and kill them.

# deletedata.txt

del key0
del key- 1
del key2 -
del key- 3.Copy the code

UNLINK is recommended for large keys:

  • DEL: blocking operation.
  • UNLINK: not always blocked. If the value is small, the same effect as DEL will be deleted. If the value is large, the key will be placed in the list and released by another thread.
# deletedata.txt

UNLINK key0
UNLINK key- 1
UNLINK key2 -
UNLINK key- 3.Copy the code

Since UNLINK supports multiple parameters (as does DEL), further optimizations can be made:

# deletedata.txt

UNLINK key0 key- 1 key2 - key- 3 key4 - key- 5
UNLINK key- 6 key7 - key- 8 - key9 - key- 10 key- 11.Copy the code

Finally, we can use the pipeline to delete, if the amount of data is really large, it is recommended to delete in batches.

# use pipes to delete the cat deletedata. TXT | redis - cli - c - pipeCopy the code

See my previous article, “Playing With Redis- How to Import, Export, and Delete Large amounts of Data in a production environment.”

The latest scripts can be viewed at Github: github.com/zxiaofan/Op… .

Refer to the article: blog.itpub.net/22664653/vi… www.cnblogs.com/klvchen/p/1…

【 Redis series of articles recently selected public account @zxiaofan】

“Playing with RedIS-8 Data Elimination Strategies and Approximate LRU and LFU Principles”

Redis: How to Import, Export, and Delete Large Amounts of Data in a Production Environment

Redis- Deleted 2 million keys, why is the memory still not free?

Play with the Use and Principle of Bloom Filter in Redis

Exploration of Redis-Hyperloglog Principle


Check out the latest series of articles on zxiaofan. Life is all about choices! In the future, you will be grateful for your hard work now! 【CSDN】【GitHub】【OSCHINA】【 Nuggets 】【 words bird 】【 wechat public account (click to follow)】