Redis basic data structures and data types
An overview of the
- Redis as a distributed cache implementation, compared with Memecache, in addition to supporting persistence, an important feature of Redis is that Redis supports rich data types, that is, Memecache only supports string types, all key-value pairs are string types, while Redis values support strings, lists, dictionaries, collections, There are five types of ordered collections, so they can provide richer operations.
- Each data type of Redis supports multiple underlying data structure implementations, that is, each data type is not bound to a single data structure, but can be multiple. The reason for this design is that Redis is an in-memory database, and all data is stored in memory. Therefore, in the design process, it is necessary to reduce memory consumption as much as possible on the premise of ensuring performance, and balance between memory and performance, so as to store more data. So before we get into data types, let’s take a look at the underlying data structures of Redis.
The data structure
Simple dynamic string
-
Redis is written in C language, Redis does not directly use the string of C language, but based on the string of C language to achieve a simple dynamic string. The main reason is that the string of C is a low-level API, which has the following problems :(1) there is no record of the string length, so it requires O(n) complexity to obtain the string length; (2) Because the string length is not recorded, it is prone to buffer overflow; (3) System call is needed every time to expand the string capacity, there is no reserved space (4) C string can only save characters.
-
Based on the above problems, the simple dynamic string of Redis provides :(1) the length of string is recorded by len to achieve O(1) complexity; (2) The internal string array is reserved to reduce the number of memory reallocation of strings, and at the same time to achieve automatic expansion to avoid buffer overflow problem; (3) The internal character array is byte based to store data, so it can save characters and binary data. Specific data structure design is as follows:
Struct __attribute__ ((__packed__)) sdshdr64 {// /* uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 LSB of type, 5 unused bits */ // char buf[]; };Copy the code
Two-way linked list
-
C language does not provide the implementation of bidirectional linked list, so Redis itself implements a bidirectional linked list, which is mainly used to store list data. The specific data structure is as follows:
-
List node definitions:
Typedef struct listNode {// struct listNode *prev; Struct listNode *next; Void *value; void *value; void *value; } listNode;Copy the code
-
Linked list definitions:
Typedef struct list {// listNode *head; // listNode *tail; void *(*dup)(void *ptr); void (*free)(void *ptr); int (*match)(void *ptr, void *key); // List length unsigned long len; } list;Copy the code
Chain hash dictionary
-
The hash table of Redis is implemented as a chained hash, that is, the conflicting nodes are maintained by a linked list. The specific data structure is as follows:
-
Hash list nodes:
// typef struct dictEntry {// key key void *key; Union {// value value void *val; uint64_t u64; int64_t s64; double d; } v; Struct dictEntry *next; } dictEntry;Copy the code
-
Chained hash table implementation:
// typepedef struct dictht {// dictEntry **table; // Number of hash elements unsigned long size; // Hash mask, used to calculate hash indexes based on ampersand, always siz-1 unsigned long sizemask; unsigned long used; } dictht;Copy the code
-
Redis hash structure:
typedef struct dict { dictType *type; void *privdata; // Two hash tables, one of which is empty and uses dictht ht[2] for rehash; long rehashidx; /* rehashing not in progress if rehashidx == -1 */ unsigned long iterators; /* number of iterators currently running */ } dict;Copy the code
Ziplist ziplist
-
Compressed lists are designed to save memory and are sequential data structures composed of contiguous chunks of specially encoded memory.
-
Compressed lists are used when the list has few elements and the element types are small integers or short strings. Because the data access of the compressed list is generally traversal, that is, linear O(n) time complexity, but with fewer elements, there is no performance problem, similar to the constant O(1) time complexity.
-
The overall structure of the compressed list is as follows: it mainly includes the total number of bytes of the list zlbytes, the number of bytes between the end node of the list and the start address of the list zLTail, the total number of elements of the list zllen, and the list element data node entry (the picture is taken from The Design and Implementation of Redis by Huang Jianhong).
-
Element nodes of the element list: each node contains the size of the previous node, so that the position of the previous node can be calculated based on the position of the current element to achieve traversal; Encoding Indicates the data type and length of the value stored by content, which is the specific value.
/* We use this function to receive information about a ziplist entry. * Note that this is not how the data is actually encoded, is just what we * get filled by a function in order to operate more easily. */ typedef struct zlentry { unsigned int prevrawlensize; /* Bytes used to encode the previos entry len*/ unsigned int prevrawlen; /* Previous entry len. */ unsigned int lensize; /* Bytes used to encode this entry type/len. For example strings have a 1, 2 or 5 bytes header. Integers always use a single byte.*/ unsigned int len; /* Bytes used to represent the actual entry. For strings this is just the string length while for integers it is 1, 2, 3, 4, 8 or 0 (for 4 bit immediate) depending on the number range. */ unsigned int headersize; /* prevrawlensize + lensize. */ unsigned char encoding; /* Set to ZIP_STR_* or ZIP_INT_* depending on the entry encoding. However for 4 bits immediate integers this can assume a range of values and must be range-checked. */ unsigned char *p; /* Pointer to the very start of the entry, that is, this points to prev-entry-len field. */ } zlentry; Copy the code
Integer set intset
-
A set of integers is a data structure of a set. This data structure is used when there are only integer elements in the set and the number of elements in the set is small.
-
The specific data structure is defined as follows:
typedef struct intset { uint32_t encoding; uint32_t length; Int int8_t contents[]; } intset;Copy the code
Jump table skiplist
-
Skip list is a special linked list structure. Each linked list node contains Pointers to other linked list nodes. Therefore, other linked list nodes can be accessed quickly without sequential traversal. The average complexity of reading and writing skip lists is O(logN), so the tree structure is similar, but the structure is simpler than the tree structure.
-
In Redis, skip lists are mainly used to implement zset, an ordered set. Each linked list node contains member objects and scores. From start node to end node, in ascending order of points.
-
Skip list definition:
Typedef struct zskiplist {// struct zskiplistNode *header, *tail; // Total number of linked list nodes unsigned long length; Int level; } zskiplist;Copy the code
-
Skip list node definition:
Typedef struct zskiplistNode {// member object SDS ele; // double score; Struct zskiplistNode *backward; Struct zskiplistNode *forward; // The span of the forward pointer relative to the current node layer unsigned int span; } level[]; } zskiplistNode;Copy the code
The data type
-
The following data types are provided for application use and are encoded using one or more of the above data structures in the underlying implementation.
-
For each key, use the TYPE command to check the data TYPE of its value. Use OBJECT ENCODING to check the underlying ENCODING TYPE, as follows:
127.0.0.1:6379> Set test 123 OK 127.0.0.1:6379> Type test string 127.0.0.1:6379> Object Encoding test "int" 127.0.0.1:6379 >Copy the code
String string
- In Redis, the corresponding integer type is stored as long if it does not exceed the range of LONG in C language. If it does, embstr or RAW string is used to store it. The corresponding strings and floating point numbers are stored using embstr or RAW strings.
- Int: An integer that can be held using type long.
- Embstr: If the number of bytes in a string is less than or equal to 32, embstr is used to encode the string. Note that integers cannot be stored using the long type. Floating point numbers are stored using strings in Redis.
- Raw: The string that embstr cannot store, that is, the string with more than 32 bytes, is encoded using RAW.
- Embstr, like raw encoding, is used to store strings. Embstr is an optimization of RAW for storing short characters. Embstr is mainly used to store short strings. In terms of memory allocation, emBSTR uses a memory allocation to create redisObject, the corresponding object of data type String in Redis, and SDS, a simple dynamic string used to store data, while RAW needs to allocate large SDS. Therefore, two memory allocations are used to create the above two objects respectively.
List the list
- Lists are mainly used to store multiple identical or different elements. In terms of the underlying coding, there are two data structures, compressed list and bidirectional linked list.
- Selection rule: When all string elements in the list are less than 64 bytes long and the number of elements in the list is less than 512, use the compressed list; otherwise, use the bidirectional list.
- The above is the default rule. The switching threshold: string length and number of list elements can be changed by using list-max-ziplist-value and list-max-ziplist-entries in the redis.conf configuration file.
The dictionary hash
- The dictionary is mainly used to store key-value pair data, and there are two data structures of compressed list and chained hash table in the underlying coding.
- Selection rule: If the string length of the key and value of the hash object is less than 64 bytes and the number of all hash objects is less than 512, use the compressed list, otherwise use the chained hash table.
- This is the default rule and can be changed using the hash-max-ziplist-value and hash-max-ziplist-entries parameters in redis.conf.
Set the set
- Set is a set that stores no repeating elements. There are two data structures, integer set and chained hash table, in the underlying coding.
- Selection rule: If all elements in the set are integer values (int8) and the number of elements does not exceed 512, use intset, otherwise use chained hash table.
- This is the default rule, of which 512 entries can be changed using set-max-intset-entries in redis.conf.
Ordered set Zset
- The ordered set mainly realizes the function of sorting by score, in which score can be repeated, but member object member cannot be repeated, that is, the ordered set is oriented to score score, and the set is oriented to member object member.
- There are also two data structures, compressed list and skip list, in the underlying coding.
- Selection rule: If the ordered set has less than 128 elements and each element’s member object is less than 64 bytes long, use the compressed list, otherwise use the skip list.
- The preceding rules can be modified by using zset-max-ziplist-entires and zset-max-ziplist-value in redis.conf.
The statement
Ps: I mixed a child programmer, if there is an error in the article please inform us, in addition, this article is only used for learning, refer to the following article, if offended please inform me to delete: Redis data type and data structure analysis