preface

Redis is an in-memory data structure storage system that can be used as a database, cache, and messaging middleware. Redis supports five common object types: String, Hash, List, Set, and Zset, and we use them a lot in our daily work. To know why, this article will take you through the underlying data structures of these five common object types.

1. Object type and code

Redis uses objects to store keys and values. In Redis, each object is represented by the redisObject structure. The redisObject structure consists of three main attributes: Type, Encoding, and PTR.

Typedef struct redisObject {// type unsigned type:4; // Unsigned encoding:4; // Pointer to the underlying data structure void * PTR; } robj;Copy the code

The type attribute records the type of the object. For Redis, the key object is always of string type, and the value object can be any supported type. Therefore, when we say which object type is used for the Redis key, we mean which object type is used for the corresponding value.

* The PTR attribute points to the underlying data structures of the object, which are determined by the Encoding attribute.

The encoding property determines the underlying data structure of the object in order to implement the same object type, supporting different underlying implementations. This allows different underlying data structures to be used in different scenarios, greatly increasing the flexibility and efficiency of Redis.

The underlying data structure will be explained later, so take a quick look here.

2. String objects

String is the object type we use most in our daily work, and its encoding can be int, RAW, or embstr. For details about the command for the string object, see Redis command-strings.

If a string object holds an integer value up to type long, the encoding type is int and the underlying data structure is directly of type long. For example, executing set Number 10086 creates an int-encoded string object as the value of the number key.

If the string object stores a string longer than 39 bytes, the encoding type is RAW and the underlying data structure is simple dynamic string (SDS). If the length is less than or equal to 39 bytes, the encoding type is EMbSTR and the underlying data structure is EMBSTR encoding SDS. Let’s take a closer look at what a simple dynamic string is.

2.1 Simple Dynamic Strings

Definition of SDS

In Redis, SDS is represented using the SDSHDR data structure:

Struct SDSHDR {// string length int len; // Number of unused bytes in the buf array int free; // A byte array to hold the string char buf[]; };Copy the code

SDS follows the convention that C strings end with a null character, and 1 byte of the saved null character is not counted in the len attribute. For example, the Redis string in an SDS might be of the following form:

SDS and C strings

C uses an array of characters of length N+1 to represent strings of length N, and the last element of the string is the null character \0. Redis uses SDS to have the following advantages over C strings:

1. Constant complexity gets the string length;

2. Prevent buffer overflow.

3. Reduce the number of memory reallocation times caused by string modification.

4. Binary security.

Constant complexity gets the length of the string

Because a C string does not record its own length, to get the length of the string, the entire string must be traversed in O(N) time. SDS uses the len attribute to record the length of the string, so the time complexity of obtaining the length of the string in SDS is O(1).

Prevent buffer overflow

Another problem with C strings not recording their length is that they can easily overflow the cache. For example, using the string concatenation function (Stract), it is easy to overwrite the original data of the character array.

Unlike C strings, SDS’s space allocation strategy completely eliminates the possibility of a cache overflow. When SDS does string augmenting, it first checks whether the current byte array is long enough. If not, automatic expansion is performed first and then string operations are performed.

Reduce the number of memory reallocations when modifying strings

Because the length of the C string is closely related to the underlying data, the program reallocates the array each time it grows or shortens a string:

1. If the string operation is to increase the size of the underlying array, the memory must be allocated to expand the size of the underlying array. Otherwise, the cache will overflow.

2. If the operation is to shorten the string, the memory needs to be reallocated to reclaim the unused space. Failure to do so will cause a memory leak.

Because memory reallocation involves complex algorithms and may require system calls, it is usually a time-consuming operation. String modification is a very frequent operation for Redis. If memory is reallocated like a C string every time, the performance impact is too great to be acceptable.

SDS disassociates the string length from the underlying data through free space. In SDS, arrays can contain unused bytes, the number of bytes recorded by the free property. SDS realizes two optimization strategies of space pre-allocation and inert space release through free space.

1. Pre-allocate space

Space preallocation is used to optimize the SDS string growth operation, which simply means that some free space is always reserved when the byte array space is insufficient to trigger the reallocation. This reduces the number of memory reallocations required for successive string growth operations.

There are two strategies for pre-allocation:

If len is less than 1MB, the same amount of free space will be allocated each time. If len is greater than or equal to 1MB: 1MB more free space is allocated for each reassignment.

2. Lazy space release

Lazy space free is used to optimize SDS string shortening operations. In simple terms, when a string is shortened, the memory is not immediately reallocated to reclaim the extra bytes, but is recorded with the free attribute, waiting for future use. SDS also provides an API to directly free up unused space and actually free up extra space when needed.

Binary security

C Characters in a string must conform to some encoding, and no null characters are allowed except at the end of the string. These limitations make C strings only hold text data.

But for Redis, you need to save not just text, but binary data. To achieve this goal, all SDS apis are binary-safe.

2.2 SDS differences between RAW and EMBSTR codes

As we mentioned earlier, strings larger than 39 bytes are encoded as RAW and the underlying data structure is simple dynamic String (SDS). This is easy to understand, for example when we execute set Story “Long, Long, Long ago there lived a king…” After that, Redis creates a raw encoded String.

The data structure is as follows:

The value is a string of 39 bytes or less. The encoding type is EMbstr and the underlying data structure is EMbSTR encoded SDS. The embSTR encoding is used to store short strings. The main difference between the EMBSTR encoding and raw encoding is that the RAW encoding calls the redisObject structure and SDSHDR structure twice. Embstr encodings call memory allocation only once and contain both redisObject and SDSHDR structures in a contiguous space.

2.3 Coding Conversion

String objects encoded by int and EMbstr are automatically converted to raw string objects if the conditions are met.

For int encoding, when we change the string to no longer be an integer value, the encoding of the string object changes from int to RAW.

For embSTR encoding, whenever we change the value of the string, the encoding of the string object changes from EMbstr to RAW.

Embstr encoded string objects can be considered read-only because Redis writes any modifications to them. When we want to modify the embSTR encoding string, we will first convert to raw encoding, and then modify.

3. List objects

List objects can be encoded as linkedList or Ziplist, and the underlying data structures are linked lists and compressed lists. For details about commands related to List objects, see Redis command -list.

By default, list objects use ziplist encoding when all string elements are less than 64 bytes long and the number of elements is less than 512, otherwise linkedList encoding is used.

This upper limit can be modified through the configuration file.

3.1 list

Linked list is a very common data structure, which provides efficient node rearrangement and sequential node access. In Redis, each linked listNode is represented using the listNode structure:

Typedef struct listNode {// struct listNode *prev; Struct listNode *next; // Node value void *value; } listNodeCopy the code

Multiple ListNodes use prev and NEXT Pointers to form a double-ended list, as shown in the figure below:

For ease of operation, Redis uses the List structure to hold linked lists.

Typedef struct list {// listNode *head; // listNode *tail; Unsigned long len; // Node copy function void *(*dup)(void * PTR); Void (*free)(void * PTR); Int (*match)(void * PTR, void *key); } list;Copy the code

The list structure provides the head pointer, tail pointer, and len length counter for the list, while the DUP, free, and match members are specific functions of the types needed to implement polymorphic lists.

The characteristics of Redis linked list implementation are summarized as follows:

1. Double-ended: Linked list nodes have prev and next Pointers, and the complexity of obtaining the pre-node and post-node of a node is O(n);

2. Loopless: the prev pointer of the head node and the next pointer of the tail node point to NULL, and the access to the linked list ends with NULL;

3. With head pointer and tail pointer: through the head pointer and tail pointer of the list structure, the complexity of the program to obtain the head node and tail node of the linked list is O(1);

4. Length counter with linked list: The program uses len attribute of the list structure to count the nodes held by the list, and the complexity of obtaining the number of nodes in the list is O(1);

5. Polymorphic: Linked list nodes use void* Pointers to hold node values, which can hold various types of values.

3.2 Compressed List

Ziplist is one of the underlying implementations of list keys and hash keys. The compressed list is a sequential data structure composed of a series of contiguous memory blocks with special encoding. A compressed list can contain any number of nodes, and each node can hold either a byte array or an integer value.

As shown in the figure above, the compressed list records the type, length, and purpose of each component.

4. Hash objects

Hash objects can be encoded ziplist or Hashtable.

4.1 the hash – ziplist

Ziplist uses a compressed list at the bottom. The implementation principle of the compressed list has been described in detail above. Each time a new key-value pair is added to the hash object, the node holding the key is pushed to the end of the compressed column table, and then the node holding the value is pushed to the end of the compressed column table. For example, we execute the following three HSET commands:

HSET profile name "tom"
HSET profile age 25
HSET profile career "Programmer"
Copy the code

If ziplist encoding is used, the Hash object has the following structure in memory:

3.2 the hash – hashtable

Hashtable encoded hash objects use dictionaries as the underlying implementation. A dictionary is a data structure used to store key-value pairs. The dictionary of Redis uses hash table as the underlying implementation. A hash table can have multiple hash table nodes, and each hash table node stores a key-value pair.

3.3 a hash table

The hash table used by Redis is defined by the dictht construct:

Typedef struct dictht{// dictEntry **table; // Hash table size unsigned long size; // Always equal to size-1 unsigned long sizemask; // Number of existing nodes in the hash table unsigned long used; } dicthtCopy the code

The table property is an array, and each element in the array is a pointer to a dictEntry structure that holds a key-value pair.

The size property records the size of the hash table, that is, the size of the table array. The used property records the number of existing nodes in the hash table. Sizemask always equals size-1. This value is used mainly for array indexes.

For example, the following figure shows an empty hash table of size 4.

Hash table node

Hash table nodes are represented by dictEntry structures, each of which holds a key-value pair:

Typedef struct dictEntry {// key void *key; // value union {void *val; unit64_t u64; nit64_t s64; } v; Struct dictEntry *next; struct dictEntry *next; } dictEntry;Copy the code

The key property holds the keys in the key-value pair, while the V property holds the values in the key-value pair. The value can be a pointer, a uint64_t integer or an int64_t integer. The next attribute points to another dictEntry node that resolves key collisions by concatenating multiple dictEntry nodes into a linked list with the same bucket bits (chain address method).

3.4 the dictionary

Redis dictionaries are represented by dict constructs:

Typedef struct dict {// type specific function dictType *type; // Private data void * privData; // Dictht ht[2]; // Rehash index // If rehash is not in progress, the value is -1 int rehashidx; }Copy the code

Ht is of size 2, and each element points to the Dictht hash table. In general, dictionaries will only use ht[0] hashes, and HT [1] hashes will only be used when rehashing HT [0] hashes. Rehashidx records the progress of rehash and has a value of -1 if no rehash has been performed.

rehash

To keep the hash table load factor (HT [0]).used/ HT [0]).size) within a reasonable range, the program needs to expand and shrink the hash table when too many or too few elements are stored.

The rehash operation is used to expand and shrink the hash table.

The steps for rehash are as follows:

  1. Allocate space for HT [1] hash table;

1. If the operation is extended, the size of HT [1] is the first 2n larger than ht[0]. Used *2. For example, ht[0]. Used =5, then ht[1] is 16 (the first 2n greater than 10 is 16).

2. If it is a shrink operation, then ht[1] is the first 2n larger than HT [0]. Used. For example, ht[0]. Used =5, then ht[1] is 8 (the first 2n greater than 5 is 8).

  1. Rehash all key-value pairs saved in HT [0] into HT [1];

  2. After the migration, release HT [0], set the current HT [1] to HT [0], and create a new blank hash table in HT [1] for the next rehash.

The timing of expansion and contraction of a hash table

1. If the server does not run the BGSAVE or BGREWRITEAOF command, the load factor is greater than or equal to 1, triggering hash table expansion.

2. When the server runs the BGSAVE or BGREWRITEAOF command, the load factor is greater than or equal to 5.

3. When the load factor of the hash table is less than 0.1, the hash table shrinks.

Progressive rehash

As mentioned earlier, to expand or shrink ht[0] requires rehashing all elements into HT [1]. If HT [0] has a large number of elements, the cost of rehashing is obviously very high, affecting Redis performance.

To solve the above problems, Redis uses a progressive rehash technique, in which elements in HT [0] are slowly rehashed into HT [1] in multiple increments.

Here are the step-by-step steps for progressive Rehash:

Allocate space for HT [1]; Maintain an index counter variable, rehashidx, in the dictionary and set its value to 0 to indicate that rehash officially begins; During rehash, every time a dictionary is added, deleted, searched or updated, in addition to the corresponding operation, all key/value pairs of HT [0] in the rehashidx index are rehashed into HT [1]. After rehash, Rehashidx + 1; As the dictionary operation continues, the migration will eventually complete at some point, at which point the rehashidx value will be set to -1 to indicate the end of rehash.

** Progressive Rehash migrates all data on one bucket at a time. ** is designed with divide-and-conquer in mind, dispersing the centralized operations over each add, delete, find, and update operation to avoid the massive computation that centralized Rehash brings.

Because the dictionary uses both HT [0] and HT [1] in progressive Rehash, dictionary deletions, look-up, and updates may be performed in both hash tables. For example, if you want to look for a key, look in HT [0] first, and continue to look in HT [1] if no key is found.

Hashtable in a hash object

HSET profile name "tom"
HSET profile age 25
HSET profile career "Programmer"

Copy the code

Save the data to the Hash object in Redis. If the Hash object is saved using hashtable encoding, the structure of the Hash object is as follows:

The picture

Ziplist encoding is used when the key and value strings of all key-value pairs held by the hash object are less than 64 bytes long and the number is less than 512, otherwise HashTable encoding is used.

This upper limit can be modified through the configuration file.

4. Collection objects

The encoding of a collection object can be intSet or HashTable. The intSet encoding is used when the collection object holds integers and the number of elements does not exceed 512; otherwise, the HashTable encoding is used.

4.1 the set – intset

Intset-encoded collection objects are implemented using integer collections at the bottom.

Intset is an abstract data structure Redis uses to store integer values. It can store integer values of type INT16_t, INT32_t, or INT64_t without duplicating the data in the set. Redis uses the intset structure to represent a collection of integers.

// uint32_t encoding; // Uint32_t length; Int8_t contents[]; } intset;Copy the code

The contents array is a low-level implementation of the contents array: each element of the integer array is an element of the contents array, and each item is ordered in order of value from smallest to largest in the array, and the array contains no duplicates.

Although the contents property is declared as an array of type INT8_T, the contents array does not actually hold any values of type INT8_T, and the actual type of value held in the array depends on encoding.

If the encoding property is INTSET_ENC_INT16, then the contents array is an array of type INT16_T, and so on.

When the type of the newly inserted element is greater than that of the existing type element in the integer collection, the integer collection must be upgraded before the new element can be added. This process is carried out in three steps:

1. Expand the size of the underlying array space of the integer set according to the new element type;

2. Convert all existing elements of the underlying array to the same type as the new element, and maintain the order of the underlying array;

3. Add the new element to the underlying array.

It is also important to note that integer collections do not support degradation. Once the array has been upgraded, the encoding remains the same.

For example, when SADD Numbers 1 3 5 is used to insert data into the collection object, the collection object has the following structure in memory:

4.2 the set – the hashtable

Hashtable encoded collection objects use dictionaries as the underlying implementation. Each key of the dictionary is a string object, each string object corresponds to a collection element, and the dictionary value is NULL.

When we execute SADD fruits “apple” “banana” “cherry” to insert data into the collection object, the collection object has the following structure in memory:

5. Ordered collection objects

The encoding for ordered collections can be ziplist or Skiplist. Ziplist encoding is used when the ordered set holds less than 128 elements and all element members are less than 64 bytes in length, otherwise skiplist encoding is used.

5.1 zset – ziplist

Ziplist-encoded ordered collections use compressed lists as the underlying implementation. Each collection element is represented by two compressed list nodes next to each other, the first node holding the element’s members (member) and the second node holding the element’s score (score).

The collection elements in the compressed list are arranged in ascending order of magnitude. If we execute ZADD price 8.5 apple 5.0 banana 6.0 cherry to insert elements into the ordered collection, the ordered collection has the following structure in memory:

5.2 zset – skiplist

Skiplist-encoded ordered collection objects use as their underlying implementation a zset structure that contains both a dictionary and a skiplist.

typedef struct zset {
    zskiplist *zs1;
    dict *dict;
}
Copy the code

Before we move on, let’s take a look at what skip lists are.

Jump table

A skiplist is an ordered data structure that enables fast access to nodes by maintaining multiple Pointers to other nodes in each node.

Redis skip lists are defined by the zskiplistNode and zskiplist structures. The zskiplistNode structure represents skip table nodes. Zskiplist stores information about skip table nodes, such as the number of nodes, Pointers to the top and bottom of the table, and so on.

Skip list node zskiplistNode

The zskiplistNode structure is defined as follows:

Typedef struct zskiplistNode {// backward pointer struct zskiplistNode *backward; // double score; // member object robj *obj; Struct zskiplistNode *forward; struct zskiplistNode *forward; // span unsigned int span; } level[]; } zskiplistNode;Copy the code

The following figure shows a skip table consisting of 4 skip table nodes (1 table head node and 3 data nodes) with a height of 5:

Skiplist implementation of ordered collection objects

As mentioned earlier, skiplist-encoded ordered collection objects use the Zset structure as the underlying implementation. A zset structure contains both a dictionary and a skip list.

typedef struct zset {
    zskiplist *zs1;
    dict *dict;
}
Copy the code

The ZS1 skip list in the ZSET structure holds all the set elements in ascending order of points, and each skip list node holds one set element.

Skip lists allow for quick score-based range lookups of ordered collections. The dict dictionary in the Zset structure creates a mapping for ordered collections from members to score values, where the keys hold the members and the values hold the score values. With a dictionary, you can look up the score of a given member with O(1) complexity.

If ZADD price 8.5 apple 5.0 banana 6.0 cherry is executed to save data to zset, the structure of the ordered set in memory is as follows if skiplist encoding is used:

6. Summary

In general, the underlying data structure of Redis mainly includes six types: simple dynamic string (SDS), linked list, dictionary, skip list, integer set and compressed list. Based on these basic data structures, five common object types including string object, list object, hash object, set object and ordered set object are realized. Each object type uses at least two data encodings, with different encodings using different underlying data structures.

The last

I have arranged a: Redis related information documents and knowledge map, Java systematic information, (including Java core knowledge points, interview topics and the latest 20 years of the Internet, e-books, etc.) friends who need to pay attention to the public number [procedures Yuan Xiao wan] can obtain.