1, the preface

Redis is famous for its high performance, but no matter how good its performance is, in the face of massive data, if it is not used correctly, it will eventually have performance bottlenecks, and even cause service downtime.

Do you have the following questions in actual projects?

  • How can I access massive amounts of data in Redis without affecting other requests to access Redis?
  • Redis has millions/tens of millions of data, how to access efficiently?
  • There is too much data in Redis, how to ensure fast access without causing service downtime?

These questions are also frequently asked in Redis interviews.

“Play Redis” series of articles mainly about the basic and advanced applications of Redis, the article is based on Redis5.0.4+, welcome to CSDN, subscription number, open Source China, Dig gold and other platforms search series of articles.


2, think

Q1: Why do some data operations cause Redis to stall or even break down when there is a large amount of data in Redis?

A1: Redis is a single-threaded service, and all instructions are executed sequentially. When an instruction takes a long time, subsequent instruction execution will be blocked. As more and more instructions are backlogged, the Redis service will consume more and more CPU, leading to Redis instance crash and even server downtime.

Q2: Use the all-powerful keys command to query any data you want?

A2: Your computer tens of thousands of data to play with good, online use keys command, Excuse me? You want to get out of here. ++ “a PHP engineer in a company executed redis keys *, causing the database to crash! The technology department had two major PO accidents this year, resulting in a capital loss of RMB 4 million.” ++ This news vividly, alarm bells ringing!

Q3: Correct operation of massive data in Redis

A3: Use the SCAN commands (SCAN, SSCAN, HSCAN, and ZSCAN) to complete data iteration.

How much do you know about Redis’s SCAN commands?

3. Detailed explanation of SCAN series commands

SCAN commands not only refer to SCAN commands, but also include SSCAN, HSCAN, and ZSCAN commands. The operation objects of each command are different, but their usage and functions are basically the same.

3.1 Comparative analysis of SCAN commands

  • Cursor: iteration cursor;
  • MATCH: data matching mode.
  • COUNT: number of iterations returned;
The command function parameter The return value
SCAN Iterate over DB based on cursors cursor [MATCH pattern] [COUNT count] Returns an array, the first value is the cursor for the next iteration (unsigned 64bit), and the second value is the element list (key list).
SSCAN Iterate over Sets based on cursors key cursor [MATCH pattern] [COUNT count] Returns an array, the first value being the cursor (unsigned 64bit) for the next iteration, and the second value being the list of elements
HSCAN Iterate over Hashes based on cursors key cursor [MATCH pattern] [COUNT count] Returns an array, and the second value is a field-value list
ZSCAN Iterate over ZSets based on cursors key cursor [MATCH pattern] [COUNT count] Returns an array, the second value being the member-score list

3.2 Precautions of the SCAN commands

  • The parameters of SCAN do not have keys, because the iterated objects are data in DB.
  • All return values are arrays, and the first value is the next iteration cursor;
  • Time complexity: each request is O(1), and O(N) is required to complete all iterations, where N is the number of elements;
  • Available versions: version >= 2.8.0;

3.3 Explanation of SCAN commands

3.3.1 Incremental iteration, which can be used in production environment

  • Unlike KEYS and SMEMBERS, which are full iterations, executing on large collections can block the service for a long time.

3.3.2 Accurate results are not guaranteed

  • SMEMBERS can return the elements of an entire set. Incremental iteration commands like SCAN can change the elements during iteration, so there is no guarantee of accurate results.

3.3.3 Based on cursor iteration

  • SCAN is based on cursor iteration. Each request returns the cursor to be used next time.
  • The cursor can be larger than the total number of DB elements and can be negative.
  • Error cursors: With interrupts (not returned by iterations), negative numbers, out of range, or other illegal cursors, iterations do not report errors and may produce undefined behavior (accuracy cannot be guaranteed);

3.3.4 End of iteration mark

  • The cursor returned by SCAN may not be incremented, and the number of elements returned by an iteration may be 0.
  • If the list of returned elements is empty, it does not mean the end of iteration.
  • A complete iteration: SCAN cursor starts at 0 and returns cursor ends at 0;
  • The iteration state is controlled by the returned cursor. Iterations can be performed concurrently; Iteration can be terminated at any time;

3.3.5 Iteration integrity

  • Data that exists from the beginning of the iteration to the end of the iteration must be returned;
  • The same element may be returned more than once, and data de-duplication should be done by the application;
  • Elements added or deleted during iteration may or may not be returned;
  • If the data type is sets (consisting of INTEGER), hashes, sorted sets and the set is small, iterating returns the entire set regardless of count.
  • At the end of iteration, ensure that the rate of element addition is less than the rate of iteration.

3.3.6, why Sometimes iteration returns the entire collection directly

  • When the underlying data structure is hash, if the data volume is small, Redis has a memory optimization strategy and uses compact compression encoding. Instead of returning a meaningful cursor, SCAN iterates over the entire collection;
  • Small amount of data? See the official memory-Optimization description.

3.3.7 Parameter count Description

  • The default value of count is 10;
  • If the dataset is large and match is not used, the element is returned as count or slightly larger than count.
  • The value of the count parameter can be different for each iteration, as long as the cursor returned from the previous iteration is used;

3.3.8 Description of parameter match

  • The pattern is similar to that of keys.
  • The MATCH operation is performed between the retrieval of data and the return of elements, so if fewer elements are matched, the list of elements returned may be empty for multiple iterations.

4. Examples of SCAN commands

4.1 SCAN example

See 5.2 Solutions to Some Questions for details.

4.2. SSCAN Example

// SSCAN example @zxiaofan
127.0. 01.:6378> SADD sscantest sscantest:1 1 sscantest:2 2 sscantest:3 3 sscantest:4 4 sscantest:1a 1a sscantest:2a 2a sscantest:1ab 1ab sscantest:a1 a1 sscantest:aa1 aa1 
(integer) 0
// MATCH ? : No matching data existsSSCAN sscantest 0 MATCH? COUNT 1 1) "24" 2)(empty list or set)127.0.0.1:6378> SSCAN sscantest 24 MATCH? COUNT 1 1) "20" 2)(empty list or set)
127.0.0.1:6378> SSCAN sscantest 0 MATCH * COUNT 1
1) "24"
2) 1) "sscantest:3"
   2) "sscantest:2a"
127.0.0.1:6378> SSCAN sscantest 24 MATCH * COUNT 1
1) "20"
2) 1) "a1"
Copy the code

4.3 EXAMPLES of HSCAN

// HSCAN example @zxiaofan
127.0. 01.:6378> HMSET hscantest hscantest:1 1 hscantest:2 2 hscantest:3 3 hscantest:4 4 hscantest:1a 1a hscantest:2a 2a hscantest:1ab 1ab hscantest:a1 a1 hscantest:aa1 aa1 
OK
127.0. 01.:6378> HSCAN hscantest 0 MATCH hscantest*a COUNT 20
1) "0"
2) 1) "hscantest:1a"
   2) "1a"
   3) "hscantest:2a"
   4) "2a"
127.0. 01.:6378> HSCAN hscantest 0 MATCH hscantest*a COUNT 2
1) "0"
2) 1) "hscantest:1a"
   2) "1a"
   3) "hscantest:2a"
   4) "2a"
127.0. 01.:6378> 

Copy the code

As you can see from the HSCAN example, all matches are returned even if the count argument is 2. This is what I mentioned earlier. If you have a small amount of data, you return all the data directly.

4.4. ZSCAN Example

// ZSCAN example @zxiaofan
// [remove] and pop up the count element with the largest score. Count defaults to 1
127.0. 01.:6378> ZPOPMAX zscantest 20
 1) "sscantest:1ab"
 2) "6"
 3) "sscantest:2a"
 4) "5"
 5) "sscantest:1a"
 6) "4"
 7) "sscantest:3"
 8) "3"
 9) "zscantest:1"
10) "2"
11) "sscantest:2"
12) "2"
13) "test1"
14) "1"
15) "sscantest:1"
16) "1"
127.0. 01.:6378> ZPOPMAX zscantest 20
(empty list or set)
127.0. 01.:6378> ZADD zscantest 1 zscantest:1 2 zscantest:2 3 zscantest:3 4 zscantest:1a 5 zscantest:2a 6 zscantest:1ab 7 zscantest:a1 8 zscantest:aa1
(integer) 8
// NX: not exist; CH: Returns the number of elements changed (including new ones)
127.0. 01.:6378> ZADD zscantest NX CH 1 test1 2 zscantest:1
(integer) 1
127.0. 01.:6378> ZSCAN zscantest 0 MATCH *a COUNT 5
1) "0"
2) 1) "zscantest:1a"
   2) "4"
   3) "zscantest:2a"
   4) "5"
127.0. 01.:6378> 
Copy the code

5, summary

5.1. See how many questions you can answer in the interview

  • Can SCAN iterations be concurrent?
  • Is the end of iteration when SCAN returns null data?
  • If the first iteration of cursor parameter is not 0, can we complete the iteration?
  • Can the amount of data returned in each iteration be strictly controlled?
  • Does the iteration return complete data?
  • Why might the list of elements returned by an iteration be empty?

5.2 Answers to some questions

5.2.1 Is the end of iteration if the data returned by SCAN is empty

// When SCAN returns empty data, the iteration is over. @zxiaofan
127.0. 01.:6378> keys k?
1) "k1"
2) "k2"
127.0. 01.:6378> SCAN 0 MATCH k?
1) "88"
2) (empty list or set)
127.0. 01.:6378> SCAN 88 MATCH k?
1) "34"
2) 1) "k1"
127.0. 01.:6378> SCAN 34 MATCH k?
1) "122"
2) (empty list or set)
127.0. 01.:6378> SCAN 122 MATCH k?
1) "14"
2) (empty list or set)
127.0. 01.:6378> SCAN 14 MATCH k?
1) "33"
2) (empty list or set)
127.0. 01.:6378> SCAN 33 MATCH k?
1) "53"
2) (empty list or set)
127.0. 01.:6378> SCAN 53 MATCH k?
1) "93"
2) (empty list or set)
127.0. 01.:6378> SCAN 93 MATCH k?
1) "107"
2) 1) "k2"
127.0. 01.:6378> SCAN 107 MATCH k?
1) "79"
2) (empty list or set)
127.0. 01.:6378> SCAN 79 MATCH k?
1) "0"
2) (empty list or set)
127.0. 01.:6378> 
Copy the code

Look at the example above, matching “k? In fact, there are two “K1” and “K2” in the data of. In the whole process of iteration, the data returned for many times is empty, but the iteration has not ended (because “K1” and “K2” have not been returned in all iterations). Therefore, the iteration is over only when the cursor returns 0.

5.2.2 If the parameter of the first iteration is not 0, can the complete iteration be realized?

// If the first iteration of cursor is not 0, can we complete the iteration? @zxiaofan
127.0. 01.:6378> keys k?
1) "k1"
2) "k2"
127.0. 01.:6378> SCAN 66 MATCH k?
1) "122"
2) (empty list or set)
127.0. 01.:6378> SCAN 122 MATCH k?
1) "14"
2) (empty list or set)
127.0. 01.:6378> SCAN 14 MATCH k?
1) "33"
2) (empty list or set)
127.0. 01.:6378> SCAN 33 MATCH k?
1) "53"
2) (empty list or set)
127.0. 01.:6378> SCAN 53 MATCH k?
1) "93"
2) (empty list or set)
127.0. 01.:6378> SCAN 93 MATCH k?
1) "107"
2) 1) "k2"
127.0. 01.:6378> SCAN 107 MATCH k?
1) "79"
2) (empty list or set)
127.0. 01.:6378> SCAN 79 MATCH k?
1) "0"
2) (empty list or set)
127.0. 01.:6378> 
Copy the code

Look at the example above, matching “k? In fact, there are two “K1” and “K2” in the data of. When the cursor was 66 in the first SCAN, we could find that “K1” had not been returned by iteration when the cursor returned to 0 after several iterations. Therefore, if the first iteration of cursor parameter is not 0, the complete iteration cannot be realized.

A complete iteration must start with a cursor at 0 and end with a cursor at 0.

6, afterword.

This paper makes a detailed comparative analysis and practical examples of Redis SCAN commands, and sorts out frequently asked questions in interviews. It is suggested that students reading this article actually practice, the effect is better. Welcome to @zxiaofan “Playing With Redis” series of articles grow together. How many interview questions can you answer now?


Good luck!

Life is all about choices!

In the future, you will be grateful for your hard work now!

【CSDN】 【GitHub】 【OSCHINA】 【The Denver nuggets】 【Wechat official account】