Redis core data structure underlying implementation

String

The simple Dynamic String (SDS) is used.

Before version 3.2

struct sdshdr {
    / / the length
	int len;
    // The remaining length
    int free;
    // Data ontology
    char buf[]; 
}
Copy the code

After Version 3.2

There are mainly the following fields

The length of len can be divided into uint8_t, uint16_t and uint64_t
Alloc Has allocated space of several types: uint8_t, uint16_t and uint64_t
Unsigned char flags indicates the type, and len is used in the minimum length
Char buf[] Specifies where data is stored

The details are as follows

typedef char *sds;

/* Note: sdshdr5 is never used, we just access the flags byte directly. * However is here to document the layout of type 5 SDS strings. */
struct __attribute__(((packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];// buf[0]: z: 0101001
};
struct __attribute__(((packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__(((packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
Copy the code

Expansion strategy

If the string length is less than 1 MB, the space expansion policy is double, that is, 100% redundant space is reserved. When the length exceeds 1 MB, only 1 MB more redundant space is allocated for each expansion to avoid excessive redundant space waste.

Why not just use native C strings

Native C strings are terminated by \0. To know the length of the string, you need to obtain the O(n) time complexity of the Strlen standard library (traversal scan). Redis is single-threaded and performance needs to be optimized.
Redis, as open source middleware, may accept calls from many types of languages, other types of languages that provide \0 as part of a string to Redis will not be supported if REids uses the C native library.

List

New version: Use QuickList see below for more information

Hash (dictionary)

Storage data structure

zipList

When the amount of data is small or a single element is small, the underlying storage is zipList. Hash-max-ziplist-entries 512 If the ziplist element size exceeds 512, change it to hashTable hash-max-ziplist-value 64 If the single element size exceeds 64 bytes, change it to HashTable

dict

struct dict {
    dictht ht[2]}Copy the code

The dict structure contains two hashtables inside, and typically only one hashtable is used. When the dict is scaled down, the hashtables need to be redistributed and then progressively relocated. In this case, the two Hashtables store the old one and the new one respectively. After the relocation is complete, the old hashtable is deleted and the new hashtable is replaced. The structure of a Hashtable is almost the same as that of a Java Hashmap. Hash conflicts are resolved in the form of a linked list. The first dimension is data and the second dimension is a linked list. The data stores a pointer to the first element of the second two-dimensional list.

Incremental relocation

This is an O(N) level operation and can be very time-consuming, so Redis uses progressive rehash to move the dictionary in small steps, embedding the move in hset and hdel commands. If the client is idle and there are no subsequent commands, Redis moves in a scheduled task.

Expansion conditions

Normally, when the number of elements in the hash table equals the length of the first array, the hash table expands to a new group twice the size of the original array. However, if Redis is doing BGSave, Redis tries not to expand (dict_can_resize) in order to reduce excessive copy on write, but if the hash table is already very full, The number of elements has reached 5 times the length of the first array (dict_forece_resize_ratio), indicating that the hash table is overcrowded and will be forced to expand.

Shrinkage volume

When the hash table becomes sparse due to primary key deletion, Redis shrinks the hash to reduce the first dimension of the hash table. The reduction condition is that the number of elements is less than 10% of the array length. Scaling does not take into account whether Redis is doing BGSave

Set

intset

If the set contains only a small number of integer elements, use intset to store set-max-int-set-entries 512 Maximum number of elements that an intset can store, or use hashTable encoding

graphic

hashtable

Same as Hash, but value is null

ZSet(Sorted Set)

Zset is a hash dictionary and a skiplist for low data loads and high data loads

ZipList (Compressed list)

It is designed to improve storage efficiency. Can be used to store strings or integers, where integers are encoded as true binary representations rather than as sequences of strings. Ziplist can provide push and POP operations on both ends of the table with O(1) time complexity.

graphic

Add elements

Inserting a new element requires realLOC to expand memory, depending on the algorithm of the memory allocator and the current Ziplist. If RealLOC may need to reallocate the new memory space and copy the previous content to the new address at once, it can also expand the existing address.

Cascade update

Because each entry has a prerawlan, it uses one byte if the content is less than 254, or five bytes otherwise. If an entry changes from 253 bytes to 254 bytes, then the prerawlen of its next entry needs to be updated from 1 to 5 bytes: If this entry was originally 253 bytes long, the prerawlen field for subsequent entries would have to be updated. If each entry in ziplist contains 253 bytes of content, changes to the content of the first entry will result in cascading updates of all subsequent entries.

The future development

May be replaced by listpack (compact list)

quickList

A hybrid of Ziplist and LinkedList

graphic

How many elements each Ziplist stores

By default, a ziplist within QuickList is 8K bytes long. If a ziplist exceeds this length, a new ziplist is created. The length of the ziplist is determined by the list-max-ziplist-size configuration

The depth of the compression

Quicklist’s default compression depth is 0, meaning no compression. The actual compression depth is determined by the list-compressed-depth parameter. In order to support fast-reading push/pop operations, the first and last ziplists of the Queicklist are uncompressed and have a depth of 1. A depth of 2 means that neither the first ziplist at the end of the QuickList nor the second ziplist at the end of the QuickList is compressed.

Skiplist

Similar to the structure of a skip table, the differences are as follows

graphic

Lookup process

It’s kind of a skip watch

Random –

For newly inserted nodes, need to call a random algorithm to assign him a reasonable layer, each layer promotion probability is about 50%. Redis standard source code promotion probability is only 25%, relative to the official skip list is more flat.

Insertion process

First we touch out the [search path] in the process of searching for the appropriate insertion point, and then we can start to create a new node by randomly assigning a number of layers to the node, and then connecting the nodes on the search path to the new node through a forward and backward pointer. If the height of the new node assigned is higher than the current maximum jump list height, update the maximum jump list height.

The update process

When we call the zadd method, if the corresponding value does not exist, it is an insert. If the value already exists, just adjusting the score is the update process. If the score value does not change the sort position, the update will be returned. (This logic has been officially accepted by Redis.) If the sorting position changes, you need to adjust the position.

Adjust the position

A simple strategy is to delete the value and then insert the element, doing two path searches.

How to calculate rankings

Redis has optimized skiplist’s forward Pointers by adding a span property to each forward pointer. Span stands for span and indicates how many nodes will be skipped if the previous node jumps between the current node and the forward pointer of the current layer. Redis updates the span value during insert and delete operations. In this way, when calculating the rank of an element, you only need to stack the SPAN values of all nodes in the search path to calculate the final rank value of the element.

Redis core data structure underlying implementation

String

Before version 3.2

After Version 3.2

Expansion strategy

Why not just use native C strings

List

Hash (dictionary)

Storage data structure

zipList

dict

Incremental relocation

Expansion conditions

Shrinkage volume

Set

intset

graphic

hashtable

ZSet(Sorted Set)

ZipList (Compressed list)

graphic

Add elements

Cascade update

The future development

quickList

graphic

How many elements each Ziplist stores

The depth of the compression

Skiplist

graphic

Lookup process

Random –

Insertion process

The update process

Adjust the position

How to calculate rankings

Related Posts

Chapter 3 Using Multidimensional Storage (Global Variables) (4)

⚡ crawler advanced ⚡ Five lines of code identification verification code — DDDDOCR (with brother OCR)

Stream in Java8 is so tough, do you know how it works?