This is the seventh day of my participation in the August More text Challenge. For details, see: August More Text Challenge
The double of string – SDS
Memory application and release
We know that memory is pieced together, when we want to assign a value to a variable, we have to ask for a piece of memory to hold the variable.
Let’s say we’re going tovar db=redis
If we firstmalloc(5)
Bytes of space, and when we get five bytes of space, we can assign a value to db.
Now that things have changed, I’m gonnadb=redis
todb=memcache
At this point, we can find that we still need three Spaces, so we only need to apply for three Spaces? Computers aren’t that smart, so I still have to apply for 8 space malloc(8).
When our db variable becomesmemcache
After, still have to recycle originalredis
Space.
Summary: The whole process is to malloc(5) to redis, then malloc(8) to memcache, and finally free(5) to redis, a total of two requests for memory. When we change our variable n times, then we’re going to apply n times and release n minus 1 times
Come on stage
The above process leads to continuous malloc and free, so the redis developers invented SDS. When we set key value, key corresponds to an SDS structure, and value corresponds to an SDS structure. Its structure is roughly as follows:
struct sdshdr {
int len;
int free;
char buf[];
};
Copy the code
- Free: How much space is left in the SDS
- Len: Indicates the length of the string
- Buf: The actual stored value
Here’s an example:
Multiple allocated space
Let’s say I have a key=redis, and it runs out of spacefree=0
.len=5
.
The end of each string ends with \0, so you know where to stop, so the real footprint of the SDS is the footprint of the string plus 1 byte.
Now when we set key memcache, we know that the key is running out of space, so we’re going to ask for more memory, but instead of just asking for 8 bytes, we’re asking for 16 bytes, where free=8 and len=8. This is the policy of Redis SDS. When the space is insufficient, the application space will be twice that of the expanded space. However, if the expanded space is larger than 1 MB, the application space will be fixed for 1 MB extra space. Now we run set key memcacheGood, and we find that free has 8 bytes of space left, so we don’t need to apply for space.
Reserve extra space
当memcacheGood
And tomemcache
Redis will save the extra four bytes instead of giving them back.
conclusion
Redis SDS uses this redundant space to reduce the process of memory application and release, so as to improve the speed, but the disadvantage is memory consumption.
Compression list – Ziplist
We know that for an array, they are contiguous in memory, and this design makes good use of the CPU cache, and access is fast. But arrays have a disadvantage: every element is the same size, even if some elements require only a small amount of space.
For example, 1,2,3,4 is actually a byte int8, but 999 is obviously not enough, at least int16. So because 99 99, 1, 2, 3, 4 also has to be int16, it’s a waste of space.
But if each element has a minimum length, then the CPU doesn’t know how to read (how many bytes at a time?). :That’s easy. We can add a length to each element, so we know how many lengths we’re going to read each time.
Hence redis’ ziplist. Compression list is a kind of to save the memory type and the order of the development of data structure, compression is used as a list and hash keys, one of the underlying implementation of compressed list can contain multiple nodes, each node can save a byte array, or an integer value when the list or there aren’t many hash element and element is a small integer value or a short string, So the bottom layer will use the Ziplist to store it. Ziplist structure is as follows:
Each of these entries is created byprevious_entry_length
,encoding
,content
Three.
- Previous_entry_length: Indicates the length of the previous node. The value can be 1 byte or 5 bytes. If the length of the previous node is less than 254 bytes, it is 1; otherwise, it is 5. Previous_entry_length tells you the address of the previous node (current address minus previous_ENTRy_length).
- Encoding: The encoding attribute of a node records the type and length of the data stored in the content attribute of the node.
- When encoding is one, two, or five bytes long, the value with the highest bit 00, 01, or 10 is a byte array encoding: This encoding indicates that the content attribute of the node holds an array of bytes, and the length of the array is recorded by the encoding excluding the highest two bits.
- When encoding is one byte long, the highest bit of the value starting with 11 is the integer encoding: this encoding indicates that the content attribute of the node holds the integer value, and the type and length of the integer value are recorded by the encoding excluding the highest two bits.
- Content: Indicates the value of the node.
- The top bit, 00, indicates that content is a byte of data, and the last six bits, 010011, equals 11, indicate the length of content
- 11 is a number
conclusion
A compressed list is a sequential data structure developed to save memory. A compressed list is used as one of the underlying implementations of list keys and hash keys. A compressed list can contain multiple nodes, each of which can hold a byte array or integer value. Ziplist is used when the list or hash has a small number of elements and the elements are small integer values or short strings.
The hash table
In Redis, the hash structure is not only used for datatype hash. All k and V in Redis are a big hash. For example, we often use set key value, where key is the hash key and value is the hash value.
When a new key is added, the hash function first calculates which bucket it should be placed in. When an element is found in the bucket, this is a hash conflict. Redis also uses a linked list to resolve hash conflicts, but there is no pointer to the end of the list. So if the new element is added to the end of the table, the time to get it will be O(N). Generally, the probability of the new element being accessed later is relatively high, so in case of conflicts, the new element will be added to the head of the table.
When the number of hash table elements increases, more and more conflicts occur, and the query efficiency decreases. When the number of hash table elements decreases, some buckets in the hash table have no data, resulting in waste. This involves the expansion and reduction of the hash table.
- There are too many elements in bucket 0, so when I look for kn, the time complexity will approach O(N).
- Buckets 0 and 2 are empty, so they are redundant
Redis itself has two hashes (H1 and H2) to start with, but only H1 is external and H2 is ready to rehash. In the absence of rehash, H2 is empty. When rehash occurs, the size of h2 depends on the number of key-value pairs in H1.
Rehash dependency: Load factor = number of stored nodes in the hash table/hash table size
-
Capacity expansion: If the service does not execute BGSave or BgreWriteAof and the load factor is greater than or equal to 1, capacity expansion is performed. When the service is performing BGSave or BgreWriteAof at this time and the load factor is greater than or equal to 5, the capacity is expanded. And the size of h2 is equal to the first 2 to the NTH which is twice the number of key pairs that are greater than or equal to h1.
-
Shrink: When the load factor is less than 0.1, the size of H2 is equal to 2^n of the number of the first key pair h1 greater than or equal to.
Trigger timing: serverCron periodically detects or adds k and V each time, recalculates and slowly migrates h1 key pairs to H2. When the migration is complete, h1 is empty, and h2 is changed to H1 and H1 to H2.
Why do I need to increase the load factor when I do BGSave or bgreWriteAof?
We know that bgSave or bgreWriteaof itself is time-consuming and cannot be stuck in the main thread, so Redis does this by forking a child process. Instead of copying a new copy of memory, the child process shares a memory with the parent process. This is the copy-on-write (copy-on-write) mechanism supported by modern operating systems. The benefit of this mechanism is to save memory. Why copy memory when everyone is the same? Eliminate waste. But what happens to the child process when a write occurs after the main process forks? Copy the entire memory? Of course not, the memory of the computer is also page storage, memory is composed of a piece of pages, when the main process is written, we only need to copy the changed page, the rest of the page is unchanged, so as to achieve the purpose of memory saving. Cow relies on page exceptions. When the parent fork the child, it sets all the pages to read only, so when a new write comes in, it triggers page exceptions, and the page is copied separately. When hashing is expanded, there will definitely be writes, which will cause a lot of page copying, so increase the load factor and try not to operate at the same time.
Progressive rehash
When redis has a small number of k and V’s, it may be ok to rehash all at once. However, when redis contains a large number of K and V, the amount of CPU computation will increase, so it is not realistic to rehash at once. To prevent rehash from impacting the service itself, rehash is performed multiple times rather than once.
- The hash update, lookup, and delete may be performed once in both hashes. For example, the lookup will look for h1 first, and if it is not found, h2 will be followed. For new operations, they will be performed directly on H2.
- When serverCron executes, it allocates 1ms time to rehash the permanent kV and 1ms time to rehash the expired KV.
The rehash process moves the pointer, so moving a 1KB value or a 1MB value doesn’t make any difference, right
conclusion
Hash is a data structure commonly used in Redis. Progressive Rehash is a high performance embodiment of Redis.
Jump table
Redis’ ordered set, zset, is useful for some scenarios, such as student grades, recently active users, etc. Its underlying implementation is skip lists, and the benefit of skip lists is that they speed up access to certain nodes.
- When there is no skip list, to access 7, it must be 1->2->3->4->5->6->7
- When we add one more layer on top of 1, 4, and 7, then access 7 can be 1->4->7, in three steps
The implementation of the skip list is the same as that shown in Figure 2 above. Each node has a different layer, and the layer is used to speed up access to other nodes and realize the jump. The higher the layer, the faster the theoretical access, but the larger the space, typically space for time.
Skip list characteristics
Let’s see how the skip list finds 20, so we start at the top level, 20 is greater than 1 and we go back, 20 is greater than 12, but 20 is less than 38, so we go down 12, and we find 38, so we go down 12, and we find 20, and we’re done.
- Each node has at least one layer
- Each layer is an ordered linked list
- The lowest level ordered list holds all the data
- If a node appears at layer X, then layer X-1 will also appear
- Each node contains two Pointers, one to the right and one to the bottom
Redis skip list
Let’s look at the Redis zset structure:
- Header: refers to the header node. Redis zset initializations create a jump table header node with layer 32, score 0, and no value
- Tail points to the tail node of the table
- Level: layer height, excluding the head node, the highest layer
- Length: The number of nodes, excluding the header nodes
- Span: Can be used to indicate the rank of a node in a collection
- Back pointer: Each node has a back pointer
- Score: the weight that determines the order
- Member: a pointer to each node object that holds an SDS value
conclusion
Whenever a new node is added, Redis will have a random layer height. The worst time complexity of skip lists is O(N), and the average time complexity of skip lists is O(logN), compared to O(N) for linked lists.