We all know that Redis has five data types, so what’s underneath them? In fact, in terms of design, I feel that the first is to save time, such as: space for time, whether the algorithm or Mysql have a lot of embodiment; Second: in the case of ensuring time, it is best to reduce the use of space; Like our Redis, there are data structures as follows: 1. Simple dynamic string 2. Let’s take a look at the design of these special data structures.

1. Simple dynamic strings

Simple dynamic strings are one of Redis’s basic data structures for storing string and integer data. And ensure binary security.

What is binary security?

In C language, “\0” is used to indicate the end of the string. If the string itself has “\0” character, the string will be truncated, that is, not binary safe; So, if we have some mechanism by which “\0” will not be truncated, then it is binary security. To address this binary security prior to Redis3.2, the following structure was designed:

struct sds {
	int len; // Number of bytes used in buF
	int free; // The number of bytes left in buf
	char buf[]; // Data space
}
Copy the code

So, thanks to len, reading and writing strings does not depend on the “\0” terminator, ensuring binary safety.

But is there room for improvement in the structure? Is it necessary for strings of different lengths to have the same header size? (that is, len and free need to be modified with an int.) Using int is a waste of space. So, let’s consider three cases: short strings, long strings and longer strings; So, for short strings, len and free are just 1 byte (1-2 to the eighth power) and for long strings, len and free are just 4 bytes; For extremely long strings, len and free are represented by 8 bytes;

This is all well and good, but it introduces a new problem: Question 1: How do you distinguish the three cases? Question 2: The header is still too long for a short string. Len and free, for example, take up 2 bytes, which is too expensive? For question 1, consider adding a field flags to identify the type; For problem 2, since len is already the minimum of 1 byte, further compression is only concerned with storing length in bits;

In order to solve these two problems, a string of 5 types (length 1 byte, 2 byte, 4 byte, 8 byte, less than 1 byte) needs to use at least 3 bits to store the type (2^3 = 8). One byte is 8 bits, and the remaining 5 bits are used to store the length. We use the following structure to store short strings less than 32:

sturct sds {
	char flags // In C, char is one byte. In Java, char is two bytes
	char buf[] // Store the actual content
}
Copy the code

The structure is shown in the figure:

If a string is longer than 31, one flags will not hold the remaining 5 bits, and the structure will need to reuse len and alloc.

sturct sds {
	uint8_t len; // The length is 1 byte, the type is different, can be changed to 16 32 64, etc.
	uint8_t alloc; // The total length is stored in 1 byte
	unsigned char flags; // Lower 3-bit storage type (that is, what length of string), the last 5 bits reserved do not apply
	char buf[]; // Store the actual content
}
Copy the code

The structure diagram is as follows:

Basic operationThe basic operations of data structure are nothing more than adding, deleting, modifying, and searching. Since SDS after Redis3.2 involves multiple types, the changes brought by modifying the string contents may affect the TYPES of SDS and lead to expansion.

Creating a string:

The general process for creating an SDS is to calculate the different types of headers and initial lengths, and then dynamically allocate memory. Note the following three points:

  1. When an empty string is created, the type is cast to SDS_TYPE_8.

  2. The “+1” operation is used to calculate the length in order to count the terminator “\0”.

  3. The return value is a pointer to the BUF field of the SDS structure.

Release string:

In order to optimize performance (and reduce the overhead of memory requrement), SDS provides the purpose of clearing the memory by resetting the statistics instead of directly freeing the memory, that is, setting the LEN of SDS to zero directly, where the existing BUF is not really cleared, and new data can be overwritten without rerequrement of memory.

Stitching string :(netease interview, test to)

The steps are as follows:

  1. If the remaining free length in SDS is greater than the length of the new content, it can be directly added to the end of BUF without expansion.

  2. If the remaining length in the SDS is less than or equal to the length of the new content, expand the capacity according to the situation. If the new total length len + addlen is less than 1MB, expand the capacity according to the new length len + addlen is more than 1MB, expand the capacity according to the new length + 1MB.

  3. Finally, the storage type is re-selected based on the new length (because the storage type may have changed) and space is allocated.

  • If there is no need to change the type, expand the flexible array directly.

  • If you need to change the type here, then you need to re-open the memory and move the buF content of the original string to a new location.

Summary: For dynamic strings, the main consideration is space utilization; Designed to save memory;

Table 2. The jump

Background: Ordered sets are common in everyday life, such as ranking students according to their scores, and ranking game players according to their scores. For the underlying implementation of ordered collections, we can use structures such as arrays, linked lists, and balanced trees. Arrays are not easy to insert and delete elements; Linked list query efficiency is low, need to traverse all elements; Balanced or red-black tree structures are efficient but complex to implement. Redis uses a new data structure, skip lists. Skip lists are as efficient as red black trees, but their implementation is far simpler. The difference between ordered linked lists and skip lists

According to the ordered list, if you want to search for a value of 51, you need to search 6 times;

According to the skip table, if you want to query the value of 51, you only need to query 5 times;

So, if the skip table is used, there will be fewer queries, if the data is large, the advantage will be more obvious;

Skip list nodes and structural skip list nodes

  • Ele: used to store string data (member)

  • Score: Used to store the score value for sorting

  • Backward: A backward pointer that can only point to the previous node at the lowest level of the current node, the head node, and the first node — backward points to NULL, used when traversing the skip list from back to forward

  • Level: indicates a flexible array. The array length of each node is different. When a skip list node is generated, a random value ranging from 1 to 64 is generated. Each entry in the Level array contains the following two elements. Forward: points to the next node in the layer, or NULL if it is the last node. Span: indicates the number of elements between the forward node and this node. A larger span indicates the number of skipped nodes

    Skip list is one of the low-level implementation methods of Redis ordered set, so ele of each node stores member value of ordered set, and score stores member score value. The points of all nodes are sorted in ascending order. If the points of all members in the ordered set are the same, the nodes will be sorted in lexicographical order of member.

The skip list structure also needs a skip list structure to manage the nodes in addition to the skip list nodes. (Some common information needs to be managed on this structure, such as the highest level of all nodes. The system cannot start from level 64, but often looks through the highest level.)

  • Header: Points to the skip table header node

  • Tail: indicates the node at the end of the skip list

  • Length: indicates the length of the skip list

  • Level: indicates the height of the skip list

    Through the properties of skip list structure, the program can quickly obtain the head node, tail node, length and height of skip list in O (1) time complexity.

In Reids, skip lists are mainly used in the underlying implementation of ordered sets (Zsets), another implementation of ordered sets is compressed lists. The underlying implementation of ordered collections in the Redis configuration file has two configurations.

  1. Zset-max-ziplist-entries 128: maximum number of zset compressed list entries. The default value is 128
  2. Zset-max-ziplist-value 64: indicates the maximum length of a string for each element when zset uses compressed lists. The default value is 64.

Because the element’s string length is also not very long, the underlying implementation of compressed lists is used by default when creating ordered collections. When zset inserts new elements, it determines two conditions: first, whether the number of elements is greater than 128; second, whether the string length of the inserted elements is greater than 64. If either condition is met, Redis converts zSet’s underlying implementation of the compressed list into a skip list. Note: After converting to a skip list, it is not converted back to a compressed list. Skip list summary: Skip list is to improve the efficiency of data query through the way of space for time. Support Redis fast features;

3. Compress the list

Ziplist is essentially a byte array. It is a linear data structure designed by Redis to save memory. It can contain multiple elements, each of which can be a byte array or an integer. Redis’s ordered collections, hashes, and lists all use compressed lists directly or indirectly. When ordered collections or hash tables have fewer elements and the elements are short strings, Redis uses compressed lists as its underlying data storage structure. Lists are stored using a fast list data structure, which is a combination of a two-way list and a compressed list. For example, run the following command to create a hash and view the hash code.



You can see that hash uses compressed lists;

Ziplist structure

Here, we explain what each part stores:

  • Zlbytes: The length of the compressed list in bytes, storing the number of bytes occupied by the entire Ziplist, including itself
  • Zltail: Stores the offset of the last entry, used to quickly locate the last element (easy to find from back to front)
  • Zllen: 16-bit unsigned integer used to store the number of entries. This value is set to 2 to the power of 16 -1 if the number of elements is greater than 2 to the power of 16 -2, at which point we need to traverse the entire list to find the number of elements.
  • Entry: Indicates the stored element
  • Zlend: an 8-bit unsigned integer used to mark the end of the entire Ziplist. Its value is 255.

Knowing the general structure of Ziplist, we need to understand the structure of a further level of entry. For each entry, there are two prefixes:

  • Prevlen represents the length of the previous byte, either 1 byte or 5 bytes. If the current element is less than 254 bytes, prevlen represents 1 byte, otherwise 5 bytes. Given that the current element starts with p, prevlen stands for the previous element. (Because they are linked together, for example, I am currently 1000 and prevlen = 100, then the preceding element starts with 900.) So, it is easier to iterate from back to front.
  • The Encoding field represents the encoding of the current element, because the data is what exists in content. At this point, the encoding field is variable in length to save memory, and the first two digits of the encoding field indicate that the content field is an integer or byte array. If the encoding field is a number, it takes less space to store. If it is a string, it will not become a number for binary safety.)
  • Entry-data (content): Data for an element that does not necessarily exist.

Insert element Since the new element is inserted, the space required to compress the list is increased (since the list is joined together, the storage space must be reallocated). So, what is the size of the space? The space size is not necessarily equal to the sum of the length of the pre-compressed list plus the length of the newly added element. Because of Prevlen; Here’s an example:

Prevlen (entryX+1) is 128 bytes long. Prevlen (entryNew) is 1024 bytes long. Prevlen is 128 bytes long. Prevlen in entryX+1 needs to be 5 bytes. Note: Later elements may also need to be updated.

When entryX is deleted, the prevlen of entryX+1 becomes 5 bytes, and the length of entryX+1 becomes 257 bytes. This leads to subsequent updates calledChain update.Since each expansion reallocates memory, it is inefficientWhen an entryX+3 is 100 bytes, there is no backchecking.

3. The dictionary

A dictionary, also known as a hash table, is a data structure used to store key-value pairs. However, there is no such data structure in C language. Redis is a K-V database, and the whole database is stored with a dictionary. The operation of adding, deleting, changing and checking the Redis database is actually the operation of adding, deleting, changing and checking the data in the dictionary. Features:

  1. You can store huge amounts of data, key-value pairs are mappings, you can find the keys and then pull out or insert the associated values in order 1 time.
  2. The key in a key-value pair can be a string, integer, floating point, etc., and the key is unique.
  3. Values in key-value pairs can be of type String, Hash, List, Set, SortedSet;

So how is Redis designed to satisfy these features?

  1. To meet the requirements of mass data storage, a data field needs to be added to point to the memory address of the data store.
  2. “After the key is found, it needs to be evaluated with O(1) time complexity.” This time, we should choose array to implement;

Because C arrays can quickly locate elements by subscript, they can store a huge amount of data as long as they have enough memory. However, array subscripts are integers, and our keys can be string floating-point, etc., so to solve this problem, we need to perform special processing on keys, which is called Hash;

In application, ready-made open source Hash algorithms are usually used. For example, the native client of Redis uses “Times 33”, and the Hash function of Redis server uses siphash algorithm (randomness is better, but the algorithm is more complex). Times 33 is one of the best known hashing algorithms for strings because it is fast and well distributed, as is hashCode in Java;

    public int hashCode(a) {
        int h = hash; // Default is 0
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i]; In fact, for odd numbers, they have the same effect, but 31 can be seen as 32 moved 5 bits to the left and then -1 is to change multiplication to bitwise and addition, so that the Java8 source code hashCode() function can be optimized again;
            }
            hash = h;
        }
        return h;
    }
Copy the code

Note: Although there are three types of key types: integer string floating-point, for Redis server, the key received from the client is actually a string; Char [] is used to save each position; In this case, the hash problem is solved by turning the string into a hash value, and then using the hash value to determine where to store it. However, there is a new problem, the hash value is very large, the array length is not so large, in this case, we need to take the length of the array modulo, the result is the key index value in the dictionary, to determine the location of the array, that is, “index value == array subscript value”; At this point, the data field needs to store 1. Array length 2. How much data has been stored in the array; As shown below:

Hash conflict

Hash + modulo solves the problem of turning strings into integers and modulo integers into array subscripts, but it also creates a new problem: hash collisions, where two different keys are associated with the same array subscript. This is called key collisions. To resolve hash collisions, there are four methods: 1: open addressing (for example, if a hash conflict occurs, generate a hash based on it, and if the hash address does not conflict, store the element in it. Linear detection is looking backwards; Linear detection, for example, then the array is a circular array, insert the data, if the conflict has been inserted into the back, to find, if found not also have been looking for, later found empty, is can’t find) 2: Hash method (with a number of different Hash function, when the conflict, use the second, third,… Equal hash function) 3: chain address method 4: the establishment of public overflow method (hash table is divided into basic table and remove table, all elements that conflict with the basic table, will be filled in the overflow table, in the search, first with the corresponding position of the base table comparison, if successful, the search is equal, then to the overflow table for sequential search. If, is not equal, the overflow table is searched sequentially. The structure of the common overflow area is very high for lookup performance when there is very little conflicting data compared to the base table.)

In order to resolve Hash conflicts, a linked list is designed, that is, the elements in the array should store the “key” in the key-value pair, as well as a pointer to next and the corresponding pointer to value. The details are shown in the figure below:



To sum up, there are the following steps when searching values according to keys:

Step 1:The key obtains the index value by performing operations such as Hash and mod, and finds the corresponding element based on the index value

Step 2:If the key in the element is the same as the key searched, the value in the element will be returned if the value is equal; otherwise, the next pointer will be read if there is a value; if there is a value, the next pointer will be read to the element, and the execution will continue at step 2.

At this point in the design, the first half of the second feature can be implemented, and the other feature is “the key is unique”, so every time the dictionary is inserted, the lookup operation is performed. If the key already exists, the modification can be performed, otherwise the insert operation is performed.

The realization of the third feature, that is, “the type of the median value of the key-value pair can be String, Hash, List, Set, SortedSet”, can Set the val field in the array element as a pointer, through the pointer to any memory where the value is.


Redis dictionary implementation

Redis dictionary is also implemented using Hash functions, because Redis is based on engineering applications, there are more factors to consider; Let’s start with a schematic of the structure:

1. Dictht is called a Hash tableSizemask: = sizemask: = sizemask: = sizemask: = sizemask: = sizemask: = sizemask: = sizemask: = sizemask: = sizemask; Used: an element that has been used;

2. DictEntry is called a Hash table nodeThe elements of an array are represented by a dictEntry structure; DictEntry is mainly used to store key-value pairs. The specific structure is as follows:

typedef struct dictEntry {
	void *key; / / store the key
	union {
		void *val; / / db. Dict val *
		int64_t s64 // Store the expiration time
	}
	v // Value store pointer
	struct dictEntry *next; // When a hash conflict occurs, refer to the conflicting elements to form a singly linked list
} dictEntry;
Copy the code

The V field is a union that stores the values in key-value pairs. Different fields are used in different scenarios. In addition to the first two Hash structures and Hash table nodes, the Redis dictionary implementation encapsulates a data structure called a dictionary in its outermost layer. Main functions: Encapsulates the hash table again, and uses the auxiliary fields in the dictionary when special operations are required. Here’s a complete Redis dictionary data structure:

ht: is an array of size 2. The element type stored in this array is DICtht (Hash table). Although there are two elements, ht[0] is generally used.

rehashidx: indicates whether the dictionary is rehashing. If not, the value is -1. Otherwise, the value is used to indicate which element ht[0] rehashes to and record the array subscript value of that element.

iterators: is used to record the number of safe iterators currently running. Rehash operations are suspended when safe iterators are bound to the dictionary. For example, if the keys command creates a safe iterator, the iterators increment by 1 and the command decrement by 1. If the sort command creates a normal iterator, the field does not change.

The dictionary capacity

The capacity expansion process is as follows: 1. Apply for a new memory. The default capacity is 4 dictentries during initial application. For non-initial Hash application, the required memory must be twice the capacity of the current Hash table. 2. Assign the newly obtained memory address to HT [1] and change the rehashidx identifier of the dictionary from -1 to 0, indicating that the rehash operation will be performed later. The schematic diagram is as follows:

After the expansion, the dictionary capacity and mask value will change, and the index value obtained after bitwise operation of the same key and mask will change, resulting in the failure of key search.

The solution is to put the newly expanded memory into a new Hash table (HT [1]) and label the dictionary with the rehash identifier (rehashidx = -1). After that, new key-value pairs are stored in the new Hash table. Modify, delete, and find operations need to be checked in HT [0] and HT [1] before deciding which Hash table to operate on. In addition, all the data in the old Hash table (HT [0]) needs to be reindexed and inserted into the new Hash table (HT [1]). This process is called rehash.Progressive rehash

Rehash is triggered not only for capacity expansion but also for capacity reduction. The entire rehash implementation of Redis is divided into the following steps:

  1. Apply for sufficient space for Hash table HT [1], regardless of capacity expansion or reduction. If, to expand, the space size is set to the current capacity *2; If scaled down, the space size is exactly 2^N power integers that contain the nodes used, and the dictionary field rehashidx is identified as 0;
  2. The dictRehash function is called to perform the rehash operation, recalculating the Hash and index values of each key in HT [0], adding them to the new Hash table HT [1] in turn, and deleting the key and value pairs from the old Hash table. Change the rehashidx field in the dictionary to the index value of the node in the Hash table HT [0] where the rehash operation is being performed.
  3. After the rehash operation, clear ht[0], swap ht[1] and HT [0], and mark the rehashidx field in the dictionary as -1.

As we know, Redis can provide high performance online services in single-process mode. When the number of key/value pairs in the database reaches millions, tens, and billions, the whole rehash process becomes very slow. If the rehash process is not optimized, it may cause serious service unavailability. Redis optimization adopts the idea of divide and conquer for rehash operation, and the general steps are as follows: Before performing insert, delete, search, and modify operations, check whether the current dictionary rehash operation is in progress. In this case, only one node (DICtentry-hash table node in my opinion) will be rehash each time. When the service is idle, if the current dictionary also needs to be rehash, the dictionary will be rehash in batches (100 nodes will be rehash each time for 1 ms). After N rehash operations, the entire HT [0] data is migrated to HT [1]. This also reflects the speed first characteristic of Redis. When idle, rehash is not rehash, and when busy, space is exchanged for time. Structure summary: The key-value storage is divided into three parts: 1. Dictionary 2. Hash table 3. Hash table node lookup elements

  1. Call the function based on the key to get its hash value
  2. Retrieves the index value based on the hash value
  3. If it is not in the Rehash state then a Hash table is iterated. If the state is rehash then both Hash tables are iterated
  4. Iterates through the singly linked list of elements and returns the element if it finds a key that matches its own key
  5. Returns NULL if it cannot be found;

Modify the element

  1. Let’s look it up and see if the key exists
  2. If not, the execution is interrupted
  3. If so, modify the value in the key-value pair to the new value (actually, a new pointer).
  4. Free old value memory

Remove elements

  1. Find if the change key exists in the dictionary;
  2. If it exists, the node is deleted from the single linked list.
  3. Release the memory occupied by keys, values, and itself of the node.
  4. Subtract 1 from the used dictionary of the Hash table.
  5. When the usage of data in the dictionary is less than 10% of the total space after a series of operations, the capacity reduction operation will be carried out to keep the memory occupied by Redis database within a reasonable range and not waste memory.

Dictionary traversal

Rules for dictionary traversal: 1. Do not repeat data 2. Do not omit any data. Full traversal 2. Intermittent traversal of the iterator

Iterators can be divided into safe iterators and common iterators.

Common iterator: Common iterator cannot be added, deleted, changed or checked during iteration. Another fingerprint field in the iterator performs strict verification to ensure that the read data does not repeat. If, in the process of iteration, there is a add, delete, change and check operation, then, will report an exception;

Security iterators: When Redis executes partial commands, it uses security iterators to iterate over dictionary data, such as keys. The keys command returns all keys of a given pattern through pattern matching. Expired keys are deleted. Redis data key-value pairs are stored in the dictionary, so the keys command iterates through the dictionary with a security iterator. The safe iterator, like the common iterator, traverses the nodes of the Hash table in the dictionary in turn by cyclic calling the dictNext function. Security iterators ensure that data is read accurately by limiting rehash. Why does limiting rehash guarantee data accuracy? For example, if rehash and traversal happen at the same time, now that the index=3 element has been traversed, then, because of the rehash, it will go to the index=33 position, which will result in repeated traversal; Keys algorithm is a traversal algorithm with the complexity of O(n). If there are keys of more than ten million levels in the instance, this instruction will cause the Redis service to be stuck. Other instructions will be delayed or even timeout error; Therefore, during the process of using Redis, do not execute commands that require a long time to run (keys, smembers, hgetall, zrange), which may cause Redis to block and affect the execution of other commands. If necessary, use the corresponding scan command to perform progressive traversal to effectively prevent blocking. Intermittent traversal When we have a large number of databases in Redis, it takes too long to execute the keys command for a full traversal of the database. To solve this problem, Redis 2.8 added “intermittent traversal”, which is mainly used when iterating over data in the dictionary. For example, hsCAN command iterates over keys in the entire database, and Zscan command iterates over all members and values of ordered sets. Both of these are dictionary traversal realized by dictScan function. Moreover, it supports rehash operations.

In the process of dictScan function traversing the dictionary intermittently, the following three situations will be encountered.

  1. From the beginning to the end of the iteration, the hash table does not rehash

  2. From the beginning to the end of the iteration, the hash table is expanded or shrunk, and the rehash operation is completed between exactly two iterations

  3. From the beginning to the end of an iteration, the hash table is being rehashed for one or more iterations

For case 1 and case 2, the rehash operation is never encountered during the traversal

For case 1, you only need to iterate over nodes in Hash table HT [0] in sequence.

In case 2, the dictionary may be expanded or shrunk during the entire traversal process. If the traversal is performed in sequence, data may be read repeatedly. Based on the capacity expansion principle, the key-value pair with subscript 0 May be distributed on nodes with subscript 0 or 4 after capacity expansion. If the first traversal is 0 to 3, and the second traversal may be expanded, then the node with the original subscript 0 May repeat itself. In order not to miss data and try not to duplicate data, Redis uniformly adopts a method called reverse binary Iteration to carry out discontinuous data iteration. Its source code is as follows:

It works like this: let’s say the four numbers are iterated in the order of 0, 2, 1, 3 in binary notation 00, 10, 01, 11. Let’s say 0, 2 are already iterated; At this point, the system expands to 8; And then, it iterates intermittently, so all it has left to iterate is 1, 5, 3, 7 because 0, 2 have already been iterated; If the size of the original Hash table is 8, and the iteration reaches 0, 4, 2, and 6, and the Hash table shrinks to 4, then the iteration only needs to iterate through 1 and 3.

Therefore, as long as there is no rehash in progress during traversal, the cursor change algorithm mentioned above can traverse the entire Hash table regardless of whether there is a reduction/expansion in the middle. (Except for the two reduction in the middle, the probability of avalanches is small. The size tables all use high-order access.) For case 3, a rehash operation is encountered during traversal

From the beginning to the end of an iteration, the Hash table is being rehashed during one or more iterations, and two Hash tables will coexist in the rehash operation: One table is the expanded or scaled down TABLE HT [1], and the other table is the old table HT [0]. The data of HT [0] is gradually migrated to HT [1] through progressive Rehash to complete the migration process.

Because both tables are large and small, data needs to be retrieved from BOTH HT [0] and HT [1]. The entire traversal process is as follows: Find the smaller table in the two Hash tables, traverse the smaller Hash table, and then the larger Hash table, and merge the results and send them to the client. Note: Scan results may have duplicate scan understanding: the SQL example first adds some keys to redis

scan 0 match k* count 5 
Copy the code

The first argument, 0, is to start at that position in the array

The second is the regular pattern for key

Note that some slots are empty, and some slots have a lot of linked lists

If the index is 26, 21 slots are empty from 0 to 26. If the index is 26, 21 slots are empty from 0 to 26. There are three matching keys in the qualified slots.

Summary: In the dictionary, Redis is designed to speed things up as follows

  1. Dictionaries are data structures designed to look up quickly;
  2. To prevent slow rehash during capacity expansion or reduction, progressive Rehash is generated.
  3. To prevent full traversal from being too slow when traversing data, intermittent traversal is also provided.

4. Integer sets

First: what problem is the set of integers proposed to solve? Answer: To make full use of memory; The premise is that set set is an optimization under set;

An integer set is an ordered structure that stores integer data. Redis is in memory, so we have to think about how we can use memory efficiently. This structure is used when the elements of the Redis collection type are all integers and are in the range of 64-bit signed integers. Example:

The first testSet store is of type integer and the second test2 store is of type hashtable; In the integer set, the data is saved according to the smallest to the largest, so that it can be searched through binary search;

Its structure is as follows:

Encoding: Indicates the encoding type, which determines how many bytes each element occupies.

To determine what type encoding format a value needs to use, just look at the range in which the value is located.

Int64 (2147483647, 9223372036854775808) or [-9223372036854775808, -2147483647, 9223372036854775808)

Int32 (32767, 2147483647) or [-2147483647, -32768)

int16 [-32768, 32767]

The data we just saved looks like this, and when we return all the data, we return it in order from small to large

At this point, suppose I need to add a 5, then I need to check whether the value can be found, if it can be found, I return it directly (there can be no duplicate values in the set); If it can’t find it, it reallocates a block of memory and migrates the data there.

If, at this point, I add 50000, then the encoding type needs to be updated, and the new number must be the smallest or the largest number; This is the largest number,

Then, a block of memory is reallocated and the original elements are moved from back to front, so that the data is not overwritten.

Summary: Integer set is an optimization of set, if the storage is integer, the use of integer set, then, when the set search efficiency will be improved; Of course, it also saves memory;

5. The realization of quicklist

Quicklist is one of the most important data structures at the bottom of Redis. It is the bottom implementation of List among the six basic data structures provided by Redis. Introduced in redis3.2. Prior to the introduction of QuickList, Redis used ziplist and adList as the underlying implementation of List. When the number of elements is small and the element length is small, Redis uses compressed linked list as the underlying storage. When any condition is not satisfied, it becomes a two-way list; The reason for this is that when the element length is small, ziplist can effectively save storage space. However, when the element length becomes long, when the element is modified, storage space must be reallocated, which will undoubtedly affect the execution efficiency of Redis, and bidirectional linked list will be used for storage.

Then, after Redis3.2, QuickList is adopted. Quicklist introduces new data structure by considering time efficiency and space efficiency comprehensively.

As we all know, the time complexity of list lookup is O(n), which is not suitable for fast lookup occasions. Quicklist is a new data structure introduced in Redis3.2. Quicklist can be thought of as a data structure consisting of several small Ziplists linked together in a bidirectional list.

Extreme cases:

Case 1: QuickList degrades to a bidirectional list when there are too many Ziplist nodes. Poor efficiency; At worst, a ziplist contains only one entry, a two-way linked list with only one element. (Increased query time complexity)

Case 2: When the number of Ziplist elements is too small, the QuickList will degenerate into a Ziplist. The extreme case is that the Quicklist has only one Ziplist node. (Added to add data, Ziplist needs to reallocate space)

So quickList is a new data structure that combines time and space efficiency. (Ziplist improves space usage and bidirectional linked lists reduce the time to insert elements);

Data is stored

Quicklist is a bidirectional linked list with Ziplist as nodes. The storage structure of QuickList looks like this:



Quicklist has the following core structures:

typedef struct quicklist {
	quicklistNode *head; 
	quicklistNode *tail;
	unsigned long count; // is an element in quickList
	unsigned long len;	// Number of QuickList nodes
	int fill : 16; // Fill is used to specify the length of the Ziplist in each quicklistNode. When fill is positive, it indicates the maximum number of items in each Ziplist. When fill is negative, it indicates the maximum number of items stored in the Ziplist node.
	unsigned int compress : 16;
}  quicklist;
Copy the code

In order to save space, Redis allows the quicklistNode in the middle to be compressed by changing the list-compressed-depth parameter. Compress = 1 and quicklistNode = 3. Compress = 1 and quicklistNode = 3.

** Note: ** The quickList node is a quicklistNode, which has the following structure:

typedef struct quicklistNode { struct quicklistNode *prev; Struct quicklistNode *next; Unsiged char *z1; Unsiged int sz; Unsiged int Encoding: 2; Unsiged int recompress: 1; }quicklistNode;}quicklistNode;Copy the code

Data compression

The actual data storage structure for each quickList node is ziplist, which has the main advantage of saving storage space. In order to further reduce the space occupied by Ziplist, Redis allows ziplist to be further compressed. The compression algorithm adopted by Redis is LZF. Compressed data can be divided into multiple fragments. Each fragment has two parts: one is the interpretation field, and the other is the storage of specific data fields.

So how does compression save storage? The basic idea of LZF data compression is: the data is repeated with the previous, record the repeated position and repeat length, otherwise directly record the original data content.

Add elements

Quicklist provides push operations that can be inserted at the head or tail of the process. If the head or tail cannot be inserted, create a New quicklistNode and insert the quicklistNode into the QuickList structure.

In addition to push, QuickList also provides methods to insert at any location. At this point, there are two cases of continuing insertion and not continuing insertion;

1) The quicklistNode at the current insertion position can still be inserted, and can be directly inserted.

2) The quicklistNode to which the current insertion position is located cannot be further inserted.

  1. You need to insert elements before the first element in the current quicklistNode, and if the previous quicklistNode of the current ziplist is available, insert data to the previous quicklistNode. If the previous quicklistNode cannot be inserted, create a New quicklistNode and insert it before the current quicklistNode.
  2. You need to insert elements after the last element in the current quicklistNode. If the next quicklistNode where the current ziplist is located can insert elements, the data is directly inserted into the latter quicklistNode. If the latter quicklistNode cannot be inserted, a new quicklistNode is inserted after the current one.
  3. In all cases where the preceding two conditions are not met, split the current quicklistNode into two quicklist nodes based on the current location, and insert the data to be inserted into one of the split quicklist nodes.

Remove elements

Delete an element and delete interval elements, anyway, first find the specified element in the deletion;

Change the element

Because quickList has a Ziplist structure inside it, ziplist is stored consecutively in memory, and changing one element may affect subsequent elements. Therefore, QuickList uses the delete before insert scheme; (Setting the new value by index is used, and the position remains the same.)

Look for the element

The QuickList element lookup is mainly for index, that is, the index of the element in the linked list to find the corresponding element. The basic idea is to find the quicklistNode where the index data resides, and then call ziplistGet to get the index data. Summary: QuickList is more about saving memory, using compressed lists to save memory; However, if the element becomes too many, when modifying the element, the storage space must be reallocated, which will also take up time. Therefore, in order to solve this problem, we use bidirectional linked list, which can reduce the time spent modifying the element; (If the list is inserted in order, it is unlikely that the storage space should be reallocated; However, if you change (add/remove) an element to the middle, you will need to reallocate the storage space, possibly with cascading updates.

conclusion

Most of the knowledge above comes from the design and implementation of Redis5. Add some simple thoughts of your own and make a note. If you don’t understand, you can go and have a look at this book. I feel it is written very well. I also welcome your advice.