What is the scan command
- The scan command is used to iterate over database keys in a database.
- The keys command is probably familiar, but it has some drawbacks. Scan is a better choice in a production environment.
Scan command and keys command
2.1 Time Complexity
The time complexity of scan command and keys command is O(N), which is consistent here.
2.2 Can PARTIAL traversal be performed
- The keys command does not support operations such as limit. Only keys that meet all conditions can be retrieved at one time
- The scan command provides the limit parameter to control the maximum number of results returned at a time.
2.3 Advantages of Scan versus Keys
Although the time complexity is O(N), scan is performed in batches and does not block the thread.
3. Usage of scan command
3.1 Basic Syntax
SCAN cursor [MATCH pattern] [COUNT count]
Copy the code
- Cursor: the cursor
- Pattern: Indicates the matching pattern
- Count: Specifies how many elements to return from the dataset. The default value is 10
3.2 instance
Redis 127.0.0.1:6379> scan 0 1) "key-12" 2) "key-8" 3) "key-4" 4) "key-14" 5) "key-16" 6) "key-17" 7) "key-15" 8) 1) "key-12" 2) "key-8" 3) "key-4" 4) "key-14" "Key-10" 9) "key-3" 10) "key-7" 11) "key-1" redis 127.0.0.1:6379> scan 17 # 1) "0" # This iteration returns 0, Said data set has been complete traversal 2) 1) "key - 5" 2) "key - 18" 3) "key - 0" 4) "key - 2" 5) "key - 19" 6) "key - 13" 7) "key - 6", 8) "key - 9" 9) "is the key to 11"Copy the code
Key points of the scan command
4.1 Guarantee of the SCAN command
All elements that have been in the dataset from the beginning of the full walk to the end of the full walk are returned by the full walk; This means that if there is an element that exists in the traversed data set from the beginning of the traversal until the end of the traversal, the SCAN command will always return that element to the user at some iteration.
Some disadvantages
The scan command is somewhat flawed due to the use of cursors to record iteration status
- The same element may be returned multiple times.
- If an element is added to or removed from the dataset during an iteration, it may or may not be returned.
4.2 Number of elements returned each time using the scan command
- The incremental iteration command does not guarantee that each execution will return a given number of elements.
- An incremental command may even return zero elements, but as long as the command returns a cursor other than zero, the application should not consider the iteration to be over.
Some of the rules
- For a large data set, incremental iteration commands may return up to a few dozen elements at a time;
- For a sufficiently small data set, the incremental iteration command returns all the elements of the data set in a single call if the underlying representation of the data set is encoded data structure (small set keys, small hash keys, and small ordered set keys).
4.3 Why are cursors not in order
The scan command is traversed using a cursor that returns 0 indicating that the entire data set is traversed.
4.3.1 Redis structure
- Redis has String, List and other data structures, but these data structures are all value data structures. The key-value mapping of Redis is realized by hash table, similar to the array + linked List form of HashMap.
- The scan command iterates through the array, returning the array index each time.
4.3.2 Scan Traversal Sequence
- If the expansion and shrinkage of Redis data structure is not taken into account, all key values can be obtained by traversal in sequence directly. However, if there is expansion and shrinkage, it is necessary to consider whether there is repeated traversal or missing traversal.
- If we add in the low order, that is, traversal from front to back, the rehash that occurs when scaling up or scaling down scatters the data into different slots, then double traversal and missing traversal can occur.
As shown below:
Capacity analysis
The figure above is a diagram of the high carry, and if the expansion occurs when we iterate to 10, we will continue to iterate
010 - > 110 - > 001 - > 101 - > 011 - > 111Copy the code
That is to say, the elements that were previously traversed will not be repeated after the expansion.This graph is normal low carry, if traversal to 01 occurs expansion, then continue traversal:
001 - > 010 - > 011 - > 100 - > 101 - > 110 - > 111Copy the code
You can see that the element 100 is iterated repeatedly, so using high carry is a better choice.
4.3.3 Redis Progressive Rehash
define
With the continuous execution of operations, the hash table to save the key/value pair will gradually increase or decrease, in order to make a hash table load factor (the load factor) to maintain in a reasonable scope, when the number of key/value pair hash table to save too much or too little, the application needs to be on the size of the hash table corresponding expansion or contraction.
process
- Allocate space for the dictionary HT [1] hash table, the size of which depends on the operation to be performed and the number of key-value pairs ht[0] currently contains (i.e., the value of ht[0]. Used) :
If an extension operation is performed, then the size of ht[1] is the first one greater than or equal to ht[0]. Used * 2 of 2^n (2 to the NTH power); If the shrink operation is performed, then ht[1] has the size of the first 2^n greater than or equal to ht[0]. Used.
- Rehash all key-value pairs saved in HT [0] onto HT [1] : Rehash means to recalculate the hash and index values of the key and then place the key-value pairs into the specified position in the HT [1] hash table.
- When all key-value pairs contained in HT [0] are migrated to HT [1] (HT [0] becomes empty), release HT [0], set HT [1] to HT [0], and create a new blank hash table in HT [1] in preparation for the next rehash.
Progressive rehash
To avoid the impact of rehash on server performance, the server does not rehash all the key pairs in HT [0] to HT [1] at one time. Instead, the server slowly rehashes the key pairs in HT [0] to HT [1].
- Allocate space for HT [1] so that the dictionary holds both HT [0] and HT [1] hash tables.
- Maintain an index counter variable, rehashidx, in the dictionary and set its value to 0 to indicate that rehash work has officially begun.
- Each time a dictionary is added, deleted, found, or updated during rehash, the program rehashes to HT [1] all key/value pairs on the REhashidx index of the HT [0] hash table, in addition to the specified operation. When the rehash is complete, the program increments the value of the rehashidx property by one.
- As the dictionary operation continues, eventually at some point in time all key-value pairs of HT [0] are rehashed to HT [1], at which point the program sets the value of the rehashidx property to -1 to indicate that the rehash operation is complete.
The benefit of progressive Rehash is that it takes a dial-and-conquer approach, spreading the computation required for rehash key and value pairs over every add, delete, find, and update operation to the dictionary, thereby avoiding the massive computation that comes with centralized Rehash.
5. Reference materials
- www.runoob.com/redis/keys-…
- Redis Design and Implementation