One, foreword

In this article, we will introduce the underlying data structure of Redis five data types in detail. Database key is a string object, the value could be a string object, the list of object, the hash object, the collection objects, the object of an ordered set of the five objects of one, Redis the underlying data structure has the following data types: simple dynamic string, linked lists, dictionaries, jump table list, integer collection, compression.

Simple dynamic strings

An abstract type called Simple Dynamic String (SDS) is built and SDS is used as the default string representation of Redis. In Redis, C strings are only used as string literals where no modification of string values is required (such as printing logs). When Redis needs a string value that can be modified, Redis uses SDS to represent the string. SDS is also used as a buffer: THE AOF buffer in the AOF module, as well as the input buffer in the client state, is implemented by SDS.

Definition 1.

1.1 Simple dynamic string structure

Struct SDSHDR {// Record the number of bytes used in the buF array // is equal to the length of the string held by the SDS int len; // Record the number of unused bytes in the buF array. // A byte array to hold the string char buf[]; };Copy the code

1.2 the sample

Save the string “Redis” with SDS as shown below:

2. The difference between SDS and C strings

1. Constant complexity Gets the length of the string

Due to the existence of len attribute, we only need to read len attribute to obtain the length of SDS string, and the time complexity is O(1). For C language, obtaining the length of the string is usually achieved through traversal counting, and the time complexity is O(n). The strlen key command is used to obtain the string length of the key.

2. Prevent buffer overflow

Another problem with C strings not recording their length is that they tend to overflow buffers.

Suppose you have two strings s1 and s2 next to each other in memory, where S1 holds the string “redis” and S2 holds the string “MongoDb”.

At this time, change the content of S1 to “Redis Cluster”, but forget to allocate enough space for S1. As a result, the data of S1 will overflow into the space where S2 resides, and the content saved by S2 will be accidentally modified.

When SDS API needs to modify SDS, IT will first check whether the SDS space meets the requirements for modification. If not, THE API will automatically expand the SDS space to the size required for modification, and then perform the actual modification operation. Therefore, using SDS does not need to manually modify the SDS space. There will also be no buffer overflow problems described earlier.

3. Reduce the number of memory reallocation times caused by string modification

Because a C string does not record its own length, the underlying implementation of a C string containing N characters is always an array of N+1 characters long (an extra string space is used to hold null characters). Because of this correlation between the length of the C string and the length of the underlying array, every time a C string is increased or shortened, the program always reallocates the array that holds the C string:

  1. If the program is growing the string, the program needs to expand the size of the underlying array through memory reallocation. Forgetting this step can cause a buffer overflow

  2. If a program shortens a string, it will need to reallocate memory to free up space that is no longer used by the string. Forgetting this step will result in a memory leak

SDS disassociates the string length from the underlying array length by using unused space: in SDS, the length of a BUF array is not necessarily the number of characters plus one, but the array can contain unused bytes, and the number of bytes is recorded by the FREE property of SDS. Through unused space, SDS realizes two optimization strategies of space pre-allocation and inert space release:

  1. Space preallocation: Space preallocation is used to optimize string growth operations for SDS. A space-expanding operation on a string expands more memory than is needed (the program allocates not only the space necessary to modify SDS, but also additional unused space to SDS), reducing the number of memory reallocations required to perform successive string growing operations.

  2. Lazy space free: Lazy space free is used to optimize the string shortening operation of SDS. When a string is shortened, the program does not immediately use memory reallocation to reclaim the shortened excess bytes, but instead uses the free attribute to record the number of bytes for later use. (SDS also provides an API for manually releasing unused space when needed.)

4. Binary security

Because the C string is marked by a null character as the end of the string, and for some binary files (such as pictures, etc.), the content may contain an empty string, so the C string cannot be accessed correctly. All SDS apis deal with elements in buF in binary terms, and SDS does not judge the end of the string by the empty string, but by the length represented by the Len attribute.

5. Compatible with some C string functions

Although SDS is binary safe, it also follows the convention that every string ends in an empty string, which makes it possible to reuse some functions from the C library <string.h>.

6. Summary

Three, the linked list

Linked list is a commonly used data structure. There is no built-in implementation of such data structure in C language, so Redis builds the implementation of linked list by itself. The linked list provides efficient node rearrangement and sequential node access, and the length of the linked list can be flexibly adjusted by adding and deleting nodes.

One of the underlying implementations of linked list keys is a linked list. Redis uses linked lists as the underlying implementation of list keys when a list key contains a large number of elements, or when the list contains long strings of elements.

In addition, linked lists are used for publish and subscribe, slow query, monitor, etc. The Redis server itself also uses linked lists to hold status information for multiple clients and to build client output buffers

Definition 1.

1.1 Linked list node structure

Typedef struct listNode {// struct listNode *prev; Struct listNode *next; Void *value; }listNode;Copy the code

1.2 the sample

Multiple ListNodes make it possible to form a list with prev and next Pointers, which is a bidirectional list.

Redis also provides data structures for manipulating linked lists.

2.1 Linked list structure

Typedef struct list{// listNode *head; // listNode *tail; Unsigned long len; Void (*dup) (void * PTR); Void (*free) (void * PTR); Int (*match) (void * PTR,void *key); }list;Copy the code

2.2 the sample

2. The characteristics of

The features of Redis’s linked list implementation can be summarized as follows:

  1. Double-ended: Linked list nodes have prev and next Pointers, and the complexity of obtaining the pre-node and post-node of a node is O(1).

  2. Acyclic: The prev pointer on the head node and the next pointer on the tail node point to NULL, and access to the linked list ends at NULL.

  3. With head and tail Pointers: Through the head and tail Pointers of the list structure, the complexity of the program to obtain the head and tail nodes of the linked list is O(1).

  4. List length counter: The program uses the len attribute of the list structure to count the list nodes held by the list. The complexity of obtaining the number of nodes in the list is O(1).

  5. Polymorphic: Linked list nodes use void* Pointers to hold node values, and you can set type-specific functions for node values through the DUP, free, and match attributes of the list structure, so linked lists can be used to hold various types of values.

Fourth, the dictionary

A dictionary, also known as a symbol table, associative array, or map, is an abstract data structure for holding key-value pairs. Each key in the dictionary is unique and can be used to find or modify values.

Definition 1.

1.1 Hash table structure

Typedef struct dictht{// dictEntry **table; // Hash table size unsigned long size; // Always equal to size-1 unsigned long sizemask; // The number of existing nodes in the hash table unsigned long used; }dicthtCopy the code
  • The table property is an array in which each element is a pointer to a hash table node (dictEntry), each of which holds a key-value pair.

  • The size property records the size of the hash table, that is, the size of the table array;

  • The sizemask attribute always equals size-1. This attribute, together with the hash value, determines which index the key should be placed on;

  • The used property records the number of existing nodes in the hash table.

1.2 the sample

An empty hash table of size 4 looks like this:

A hash table is made up of an array table, where each element points to a dictEntry structure.

2.1 Hash table node structure

Typedef struct dictEntry{// key void *key; // value union{void *val; uint64_tu64; int64_ts64; }v; Struct dictEntry *next; struct dictEntry *next; }dictEntryCopy the code
  • The key property holds the key in a key-value pair;

  • The V attribute holds the value in the key-value pair, where the value is defined by union and supports three data types. Can be a pointer, uint64_t integer, or int64_t integer.

  • The next property is a pointer to another hash table node that can resolve key collisions by joining together multiple key-value pairs with the same hash value. Two keys k1 and k0 with the same index value are joined together by the next pointer.

2.2 the sample

3.1 Dictionary Structure

Typedef struct dict {// type specific function dictType *type; // Private data void * priveData; // Dictht ht[2]; // Rehash index // If rehash is not in progress, the value is -1 int trehashidx; }dict;Copy the code
  • The Type attribute is a pointer to the dictType structure. Each dictType structure stores a set of functions for manipulating key value pairs of a specific type. Redis will set different type-specific functions for dictionaries with different uses

  • The priveData attribute holds the optional parameters that need to be passed to those type-specific functions

  • The HT attribute is an array of two items, each item in the array is a Dictht hash table. In general, dictionaries only use ht[0] hash tables, and HT [1] hash tables are only used for rehashing HT [0] hash tables

  • The treHashidx property records the current progress of the rehash and has a value of -1 if no rehash is currently in progress

3.2 the sample

A dictionary in the normal state (without rehash) would look like this:

2. Hash algorithm

To add a new key-value pair to a dictionary, the program first calculates the hash and index values based on the keys of the key-value pair. Then, based on the index values, the hash table node containing the new value pair is placed above the specified index of the hash table array

Redis calculates hash and index values as follows:

Hash = dict -> type -> hashFunction(key); Ht [x] can be HT [0] or HT [1] index = hash & dict -> ht[x].sizemask;Copy the code

3. Resolve key conflicts

When two or more keys are assigned to the same index of a hash table array, we say the keys are in conflict.

Each hash table node has a next pointer. Multiple hash table nodes can use the NEXT pointer to form a one-way linked list. Multiple nodes assigned to the same index can be connected by this one-way linked list, which solves the problem of key conflict.

Because the list of dictEntry nodes has no pointer to the end of the list, the program always adds new nodes to the head of the list (complexity O(1)) ahead of other existing nodes for speed.

Use linked lists to resolve k2 and k1 conflicts:

4. Expand and shrink

In order to keep the load factor of the hash table within a reasonable range, when the hash table holds too many or too few key-value pairs, the program needs to expand or shrink the size of the hash table accordingly.

1. Expansion and contraction steps

The work of expanding and contracting hash tables can be done by performing a rehash operation. Redis rehashes the dictionary’s hash table as follows:

  1. Allocates space for the dictionary ht[1] hash table, the size of which depends on the operation to be performed and the number of key-value pairs ht[0] currently contains (that is, the value of the HT [0].used attribute)

    • If an extension is performed, the size of ht[1] is greater than or equal to ht[0]. Used *2n.

    • In the case of a shrink operation, each shrink creates a new hash table based on a doubling of the used space.

  2. Reuse the hash algorithm above, compute the index value, and then place the key-value pair in the new hash table location.

  3. When all key-value pairs contained in HT [0] are migrated to HT [1] (HT [0] becomes empty), release HT [0], set HT [1] to HT [0], and create a new blank hash table in HT [1] for the next rehash.

2. Triggering conditions for expansion and contraction

Hash table load factor calculation formula: load_factor = ht[0]. Used/HT [0]. Size (Load factor = number of nodes saved in hash table/size of hash table)

  1. When any of the following conditions are met, the program automatically begins to perform an extension operation on the hash table:

    • The server is not currently executing BGSAVE or BGREWRITEAOF, and the hash table load factor is greater than or equal to 1.

    • The server is running the BGSAVE or BGREWRITEAOF command and the hash table load factor is greater than or equal to 5.

  2. When the load factor of the hash table is less than 0.1, the program automatically starts to shrink the hash table.

5. Progressive Rehash

To avoid the impact of rehash on server performance, the server does not rehash all the key pairs in HT [0] to HT [1] at one time. Instead, the server slowly rehashes the key pairs in HT [0] to HT [1]. Because the dictionary uses both THE HT [0] and HT [1] hash tables during progressive Rehash, dictionary deletion, lookup, and update operations are performed on both hash tables during progressive Rehash. New key-value pairs added to the dictionary are stored in HT [1], and ht[0] does not add any new pairs.

Skip lists

A skiplist is an ordered data structure that allows fast access to nodes by maintaining multiple Pointers to other nodes in each node. Redis uses skip lists as one of the underlying implementations of ordered set keys. If an ordered set contains a large number of elements, or if the elements in the ordered set are long strings, Redis uses skip lists as the underlying implementation of ordered set keys. Redis uses skip lists in only two places, one to implement ordered collection keys and the other as internal data structures in cluster nodes.

Definition 1.

1.1 Skip list node structure

Typedef struct zskiplistNode {// struct zskiplistNode *forward; // span unsigned int span; }level[]; Struct zskiplistNode *backward; // double score; // member object robj *obj; } zskiplistNodeCopy the code
  • The level attribute can contain multiple elements, each containing a pointer to another node. Programs can use these layers to speed up access to other nodes. Generally, the more layers there are, the faster access to other nodes will be.

    • Each layer has a forward pointer to the end of the table to access nodes from the head to the end of the table.

    • The SPAN property records the distance between two nodes.

      • The greater the span between two nodes, the farther apart they are.
      • All forward Pointers to NULL have a span of 0 because they are not connected to any node.
  • The backward pointer attribute is used to access nodes from the end of the table to the head: unlike the forward pointer, which can skip multiple nodes at once, since there is only one backward pointer per node, you can only go backwards to the previous node at a time.

  • The score attribute is a floating-point number of type double, and all the nodes in the skip list are sorted from smallest to largest.

  • The obj property (member object) is a pointer to a string object that holds an SDS value.

In the same skip list, each node must hold unique member objects, but multiple nodes can hold the same points: Nodes with the same score are sorted by the size of their member objects in lexicographical order, with nodes with smaller member objects coming first (near the head of the table) and nodes with larger member objects coming second (near the bottom of the table).

1.2 the sample

The following figure shows three nodes at 1, 3, and 5 levels respectively

2.1 Skip list structure

Typedef struct zskiplist {// structz skiplistNode *header,*tail; // Number of nodes in the table unsigned long length; Int level; int level; }zskiplist;Copy the code
  • The header property points to the table head node of the skip list.

  • The tail property points to the end-of-table node of the skip table.

  • The Length property records the length of the skip table, that is, the number of nodes the skip table currently contains.

  • The level property records the level of the node with the largest level in the current skip table.

The header and tail Pointers point to the head and tail nodes of the skip table, respectively. By using these two Pointers, the program locates the complexity of the head and tail nodes as O(1). The level property is used to obtain the number of layers of the node with the highest middle height of the skip table in O(1) complexity, not counting the height of the top node.

2.2 the sample

Set of integers

The set of integers is one of the underlying implementations of the set key. Redis uses the set of integers as the underlying implementation of the set key when a set contains only integer numeric elements and the set has a small number of elements. Intset is an abstract data type Redis uses to hold integer values. It can hold integer values of type INT16_t, int32_t, or int64_t, and ensure that there are no duplicate elements in the set.

Definition 1.

1.1 Integer Structure

// uint32_t enconding; // Uint32_t length; Int8_t contents[]; }intset;Copy the code
  • Contents Contents an array is a low-level implementation of a set of integers: each element of the set of integers is an element of the contents array, and the items are sorted in order of value from smallest to largest in the array, and the array contains no duplicates. The true type of the contents array depends on the value of the enconding property.

  • Length keeps track of the number of elements in the integer set, which is the length of the contents array.

1.2 the sample

2. The upgrade

When our new element type is larger than the original set element type, we need to upgrade the set of integers to put the new element into the set of integers. The integer collection upgrade strategy has two benefits, one is to improve the flexibility of the integer collection, and the other is to save as much memory as possible.

  1. Expands the size of the underlying array of integers based on the new element type and allocates space for the new element.

  2. Convert all existing elements of the underlying array into elements of the same type as the new element, and place the converted elements in the correct position, while maintaining the entire element order.

  3. Adds a new element to the collection of integers (to ensure order).

3. The drop

Integer collections do not degrade, and once an array has been upgraded, the encoding remains the same.

Compress lists

Compressed lists are one of the underlying implementations of list building and hash keys. When a list key contains only a small number of list items, and each item is either a small integer value or a short string length, Redis uses compressed lists for the underlying implementation of the list key. When a hash key contains only a small number of key-value pairs, and the keys and values of each pair are either small integer values or short strings, Redis uses a compressed list for the underlying implementation of the hash key

Definition 1.

Compressed list is developed by Redis to save memory. It is a sequential data structure composed of a series of specially coded contiguous memory blocks. A compressed list can contain any number of nodes, and each node can hold either a byte array or an integer value

A detailed description of each component of the compressed list:

Compress the various components of the list node:

  • Previous_entry_ength: The previous_entry_length property of a node records, in bytes, the length of the previous node in the compressed list. The length of the previous_entry_length attribute can be 1 or 5 bytes. Because the node’s previous_entry_length property records the length of the previous node, the program can calculate the start position of the previous node based on the start position of the current node using a pointer operation.

    • If the length of the previous node is less than 254 bytes, the length of the previous_entry_length attribute is 1 byte: the length of the previous node is stored in this one byte

    • If the previous node is larger than 254 bytes, the length of the previous_entry_length attribute is 5 bytes: the first byte of the attribute is set to 0xFE (decimal 254), and the next four bytes are used to hold the length of the previous node

  • Encoding: The encoding stores the content type and length of the node’s content. There are two types of encoding, one is a byte array and the other is an integer. The length of the encoding field can be 1, 2, or 5 bytes.

  • Content: The content property of a node holds the value of the node, which can be a byte array or an integer. The type and length of the value are determined by the encoding property of the node.


There is a state called black: I am so smart that I have never used a comb.

Follow the wechat public account ‘Balding Ji’, smart enough to go black.