This article covers the following two operations of Redis and its Python implementation:

  • SCAN the command
  • DEL command
  • Using the Python SCAN
  • Using the Python DEL
  • Results show

SCAN the command

The SCAN command and its related SSCAN, HSCAN, and ZSCAN commands are used to incrementally iterate a collection of elements:

  • SCAN is used to iterate over database keys in the current database
  • SSCAN is used to iterate over elements in a collection key
  • HSCAN is used to iterate over key-value pairs in hash keys
  • ZSCAN is used to iterate over elements in ordered collections (both element scores and element scores)
All four columns support incremental iteration, returning a small number of elements per execution, so they can be used in a production environment without the possibility of blocking the server as KEYS and SMEMBERS do



However, the incremental iterative command is not without its drawbacks:

For example, you can use the SMEMBERS command to return all the elements currently contained by the collection key, but for incremental iterations like SCAN, because keys can change during incremental iterations of the heap key, So incremental iteration commands offer limited guarantees about the returned elements.



Because the SCAN, SSCAN, HSCAN, and ZSCAN commands all work very similarly, but remember:

  • The first argument to the SSCAN, HSCAN, and ZSCAN commands is always a database key;
  • The SCAN command does not need to supply any database keys in the first parameter — it iterates over all database keys in the current database.


Basic usage of the SCAN command

The SCAN command is a cursor based iterator:

Each time the SCAN command is invoked, a new cursor is returned to the user. The user needs to use this new cursor as the cursor parameter of the SCAN command in the next iteration to continue the previous iteration.

When the cursor parameter of the SCAN command is set to 0, the server starts a new iteration, and the iteration ends when the server returns a cursor with a value of 0 to the user.

Example:

Redis 127.0.0.1:6379> Scan 0 1)"17"
2)  1) "key:12"
    2) "key:8"
    3) "key:4"
    4) "key:14"
    5) "key:16"
    6) "key:17"
    7) "key:15"
    8) "key:10"
    9) "key:3"
    10) "key:7"
    11) "key:1"Redis 127.0.0.1:6379> Scan 17 1)"0"
2) 1) "key:5"
   2) "key:18"
   3) "key:0"
   4) "key:2"
   5) "key:19"
   6) "key:13"
   7) "key:6"
   8) "key:9"
   9) "key:11"Copy the code
In the example above, the first iteration uses 0 as a cursor to indicate the start of the first iteration.

The second iteration uses the cursor returned from the first iteration, that is: 17.

As you can see from the example, the SCAN command returns an array of two elements. The first element is a new cursor, and the second element is also an array containing contained elements.

When the SCAN command is called the second time, the cursor 0 is returned, indicating that the iteration is complete and the entire collection has been traversed.

This process is called full Iteration.



To recap, add three things:

  1. Because the SCAN command only uses cursors to record iteration status, there is no guarantee that elements will not be returned if they are increased or decreased during iteration. If it is increment, it is not guaranteed to return; And in some cases the same element can be returned more than once. So it is desirable that the operation performed on the element returned by the iteration be repeated multiple times (idempotent).
  2. The incremental iteration command does not guarantee the number of elements returned in each iteration, but we can use the COUNT option to adjust the behavior of the command to some extent. The default value of the COUNT parameter is 10. When iterating over a database, set key, hash key, or ordered set key that is large enough to be implemented by a hash table, if the user does not use the MATCH option, the command usually returns the same number or more as specified by the COUNT option (😓), When the iteration is encoded as a collection of integers (intset: a small collection of integer values) or as a compressed list (ziplist: a small hash or a small ordered collection of different values), the value specified by the COUNT option is ignored and all elements of the dataset are returned to the user on the first iteration.
  3. The MATCH option, for example, is shown below
Example:

Redis 127.0.0.1:6379> sadd myset 12 3 foo foobar feelsgood (integer) 6

redis 127.0.0.1:6379> sscan myset 0 match f*
1) "0"
2) 1) "foo"
   2) "feelsgood"
   3) "foobar"Copy the code
Note: Pattern matching of elements is done after the command has fetched the element from the dataset and before it is returned to the client, so it is possible to return null

Example:

Redis 127.0.0.1:6379> Scan 0 MATCH *11* 1)"288"
2) 1) "key:911"Redis 127.0.0.1:6379> Scan 288 MATCH *11* 1)"224"
2) (empty list or set)

redis 127.0.0.1:6379> scan 224 MATCH *11*
1) "80"
2) (empty list or set)

redis 127.0.0.1:6379> scan 80 MATCH *11*
1) "176"
2) (empty list or set)

redis 127.0.0.1:6379> scan 176 MATCH *11* COUNT 1000
1) "0"
2)  1) "key:611"
    2) "key:711"
    3) "key:118"
    4) "key:117"
    5) "key:311"
    6) "key:112"
    7) "key:111"
    8) "key:110"
    9) "key:113"
   10) "key:211"
   11) "key:411"
   12) "key:115"
   13) "key:116"
   14) "key:114"
   15) "key:119"
   16) "key:811"
   17) "key:511"
   18) "key:11"Copy the code
Note: On the last iteration, the COUNT option was specified as 1000 to force the command to scan more elements for this iteration, resulting in more elements being returned.

DEL command

This one is simpler: delete one or more keys given

redis> SET name "redis"
OK
redis> SET type "key-value store"
OK
redis> SET website "redis.com"
OK
redis> DEL name type website
(integer) 3Copy the code

Using the Python SCAN

Redis installation package

pip install redisCopy the code
Examples of complete code:

import redis

pool=redis.ConnectionPool(host='redis_hostname', port=6379, max_connections=100)
r = redis.StrictRedis(connection_pool=pool)

cursor_number, keys = r.execute_command('scan'.0."count".200000)

while True:
    if cursor_number == 0:
        # Finish a complete comparison traversal
        break
    cursor_number, keys = r.execute_command('scan', cursor_number, "count".200000)
    # do something with keys

Copy the code
I saved 2.2g keys to be deleted in a file, about 4000W keys. The next step was to delete them

Using the Python DEL

Because the file is so big, we use a little trick, which is to read it in chunks

with open("/data/rediskeys") as kf:
    lines = kf.readlines(1024*1024)Copy the code
One trick used when calling the delete method is the “*” asterisk

r.delete(*taskkey_list)Copy the code
Let’s look at the definition:

delete method

Put the full code here:

import redis
import time

pool=redis.ConnectionPool(host='redis_hostname', port=6379, max_connections=100)
r = redis.StrictRedis(connection_pool=pool)

start_time = time.time()
SUCCESS_DELETED = 0

with open("/data/rediskeys") as kf:
    while True:
        lines = kf.readlines(1024*1024)
        if not lines:
            break
        else:
            taskkey_list = [i.strip() for i in lines if i.startswith("UCS:TASKKEY")]
            SUCCESS_DELETED += r.delete(*taskkey_list)

        print SUCCESS_DELETED

end_time = time.time()
print end_time - start_time, SUCCESS_DELETEDCopy the code

Results show

End, see you in the next post

My Zhihu · My Zhihu columnmy GitHub · My Gist