If you’ve used Redis, you’re as interested in its internals as I am. “Redis internal Data structure details” is a series I plan to write, but also my previous research on Redis a stage summary, focusing on the implementation of Redis in memory data structure (not involved in the topic of persistence). Redis is essentially a data structures server, which implements a variety of ready-made data structures in an efficient way. The study of its data structure and algorithms based on it is of great reference significance for us to improve the programming level of local algorithms.

When we refer to Redis’s “data structure” in this article, we may be talking about it on two different levels.

The first level is from the user’s perspective. Such as:

  • string

  • list

  • hash

  • set

  • sorted set

This layer is also the invocation interface that Redis exposes to the outside world.

The second level, from an internal implementation point of view, belongs to the lower-level implementation. Such as:

  • dict

  • sds

  • ziplist

  • quicklist

  • skiplist

The first level of “data structure”, Redis official document (http://redis.io/topics/data-types-intro) has introduced in detail. The focus of this article is to discuss the second layer, the internal implementation of Redis data structures, and the relationship between the data structures of the two layers: how Redis implements the higher level data structures of the first layer by combining the various basic data structures of the second layer.

When discussing the internal implementation of any system, it is important to define its design principles so that we can better understand why it is designed the way it is. In the following discussion, we will focus on the following points:

  • Memory efficiency. Redis is dedicated to storing data, its main consumption of computer resources is memory, so saving memory is a very important aspect of it. This means Redis must have thought very carefully about compressing data, reducing memory fragmentation, and so on.

  • Fast Response time. The opposite of fast response time is high throughput. Redis is designed to provide online access, and the response time requirements for a single request are high, so fast response time is a more important goal than high throughput. Sometimes these two goals are at odds.

  • Single-threaded. Redis’ performance bottleneck is not CPU resources, but memory access and network IO. The advantage of using single thread design is that it greatly simplifies the implementation of data structure and algorithm. Instead, Redis enables high-speed concurrent access through mechanisms such as asynchronous IO and Pipelining. Obviously, the single-threaded design also puts forward higher requirements for the fast response time of a single request.

This article, the first in a series of Redis Internal Data Structures, looks at one of Redis’s most important underlying data structures: dict.

A dict is a data structure for maintaining key and value mappings, similar to maps or dictionaries in many languages. Redis maintains all key-to-value mappings in a database using a dict. However, this is only one use in Redis, it is used in many other places. For example, a Redis hash structure with a large number of fields will be stored using dict. For example, Redis uses dict and Skiplist to maintain a sorted set together. We’ll discuss these details later, but in this article, we’ll focus on the implementation of Dict itself.

Dict is essentially to solve Searching problems in algorithms. General Searching problems can be divided into two categories: one is based on various balance trees, the other is based on hash table. Most of the maps or dictionaries we use are based on hash tables. On the premise of not requiring ordered storage of data and keeping a low probability of hash value conflict, the search performance based on hash table is very efficient, close to O(1), and simple to implement.

In Redis, dict is also a hash table-based algorithm. Similar to the traditional hash algorithm, it uses a hash function to get the position in the hash table from the key calculation, uses the zipper method to resolve conflicts, and automatically expands the memory when the load factor exceeds the predetermined value, triggering rehashing. One of the most notable features of Redis dict implementation is its rehash. Incremental rehashing is used to avoid rehashing all keys at once when memory needs to be expanded. Instead, rehashing is spread over dict increments, deletes, and checks. This approach allows you to rehash only a small number of keys at a time without affecting dict operations between rehashes. Dict is designed to avoid a dramatic increase in response time for individual requests during rehashing, which is consistent with the “fast response time” design principle mentioned earlier.

The details are described below.


Dict data structure definition

For incremental rehashing, the dict data structure contains two hash tables. During rehash, data is migrated from the first hash table to the second hash table.

Dict C is defined as follows (Redis source dict.h) :

Review images


To show dict’s data structure definition more clearly, we represent it with a structure diagram. As follows.



Review images


Combined with the code and structure diagram above, it’s clear what dict looks like. A dict consists of the following items:

  • A type pointing to a dictType structure. It allows dict keys and values to store any type of data in a custom way.

  • A private data pointer. Passed in by the caller when the dict is created.

  • Two hash tables (HT [2]). Both HT [0] and HT [1] are valid only during rehashing. In normal cases, only HT [0] is valid and HT [1] does not contain any data. The figure above shows what happens when the rehash gets somewhere in the middle.

  • Current rehash index (rehashidx). If rehashidx = -1, it is not currently in the rehash process. Otherwise, it indicates that the current rehash is in progress and its value records the current rehash progress.

  • The number of iterators currently iterating. That’s not what we’re talking about right now, so ignore it.

DictType structure contains several function Pointers for dict callers to customize various operations involving keys and values. These operations include:

  • HashFunction, a hash algorithm that evaluates keys.

  • KeyDup and valDup, respectively, define copy functions for keys and values, which are used to make deep copies of keys and values when needed, rather than just passing Pointers to objects.

  • KeyCompare, which defines the comparison of two keys, used when looking up by key.

  • KeyDestructor and valDestructor, which define destructors for key and value, respectively.

The privdata pointer is returned to the caller when certain operations of the dictType are called.

The dictht structure that needs to be looked at in detail. It defines the structure of a hash table, consisting of the following items:

  • An array of dictEntry Pointers (table). The hash of key eventually maps to a location in the array (corresponding to a bucket). If multiple keys map to the same location, a conflict occurs, and a dictEntry list is pulled.

  • Size: Identifies the length of the dictEntry pointer array. It’s always an exponent of 2.

  • Sizemask: Positional index used to map hash values to tables. Its value is equal to (size-1), such as 7, 15, 31, 63, etc., which is the binary representation of all 1s in each bit. Each key is computed by hashFunction to get a hash value, and then computed (hash & sizemask) to get its position in the table. Equivalent to calculating the remainder (hash % size).

  • Used: Records the number of data in the dict. The ratio of this to size is the load factor. The larger the ratio, the higher the probability of hash conflicts.

The dictEntry structure contains k, V, and the next pointer to the next item in the list. K is a void pointer, which means it can point to any type. V is a union, and when its value is uint64_t, int64_t, or double, no additional storage is needed, which helps reduce memory fragmentation. Of course, v can also be a void pointer to store any type of data.


dictCreate

Review images

DictCreate allocates space to the dict data structure and assigns initial values to each variable. The two hash tables HT [0] and HT [1] do not allocate space at the beginning, and the table pointer is NULL. This means that space is not actually allocated until the first data is inserted.


DictFind (dictFind)

Review images

DictFind does these things in turn according to whether the dict is currently rehashing:

  • If you are currently rehashing, take the rehashing process one step further (that is, call _dictRehashStep). In fact, in addition to lookup, insert and delete also trigger this action. This disperses the rehash process into various find, insert, and delete operations, rather than concentrating it all at once.

  • Compute the hash of the key (call dictHashKey, whose implementation calls the aforementioned hashFunction).

  • Start by looking at the first hash table ht[0]. Locate the hash on the table array (bitwise and with hash and sizemask, as mentioned earlier) and then look it up on the corresponding dictEntry linked list. The dictCompareKeys call dictCompareKeys and the implementation in it will call the aforementioned keyCompare. Returns the item if found. Otherwise, proceed to the next step.

  • Determines if it is currently rehashing, if not, the result of the search on HT [0] is final (NULL is returned if it is not found). Otherwise, the lookup is done on HT [1] (same procedure as the previous step).

Next we need to look at the _dictRehashStep implementation of the incremental rehash.



Review images

The dictRehash moves the rehash forward at least n steps at a time (unless the whole rehash ends in less than n steps), each step moving every dictEntry on a bucket (i.e. a dictEntry linked list) on HT [0] to HT [1], Its new position on HT [1] is recalculated according to the SIZemask of HT [1]. Rehashidx records the bucket location of ht[0] that is not currently migrated (yet to be migrated).

If the bucket to which Rehashidx points does not have any dictentries when dictRehash is called, then it has no data to migrate. It then tries to iterate backwards through the HT [0].table array until it finds the next bucket that holds the data. If you cannot find it, you can take n*10 steps at most. The rehash ends temporarily.

Finally, if all data on HT [0] is migrated to HT [1] (that is, d-> HT [0]. Used == 0), then the entire rehash ends, ht[0] becomes the contents of HT [1], and HT [1] is reset to null.

Table [0] and HT [0].table[1]) have been migrated to HT [1].


DictAdd and dictReplace

DictAdd inserts a new pair of key and value. If the key already exists, the insert fails.

DictReplace also inserts a pair of keys and values, but it updates the value when the key exists.

Review images


The above is the key implementation code of dictAdd. We need to pay attention to the following points:

  • It also triggers a further rehash (_dictRehashStep).

  • If it is rehashing, it inserts data into HT [1]; Otherwise insert into HT [0].

  • Whenever data is inserted into the corresponding bucket, it is always inserted into the head of the dictEntry. Because the probability that the new data will be accessed next is high, there will be fewer comparisons to look it up again.

  • Dictkeyindex looking for the insertion position in the dict. If it is not in the process of rehashing, it only looks for HT [0]; Otherwise look for HT [0] and HT [1].

  • Dictkeyindex may trigger dict memory expansion (_dictExpandIfNeeded, which doubles the length of the hash table, see dict.c source code for details).

DictReplace is realized on the basis of dictAdd as follows:


Review images

When key already exists, dictReplace will call dictAdd and dictFind at the same time, which is actually equivalent to two search processes. Here the Redis code is not optimized enough.


DictDelete (dictDelete)

DictDelete is ignored here. For details, see dict.c. A little attention should be paid to:

  • DictDelete also triggers a push step rehash (_dictRehashStep)

  • If it is not currently in the process of rehashing, it only looks for the key to delete in HT [0]; Otherwise it looks for both HT [0] and HT [1].

  • The destructors for key and value (keyDestructor and valDestructor) are called upon successful deletion.


           



Dict is relatively simple to implement, and that’s all I’ll cover in this article. In the next article we will introduce the dynamic string implementation in Redis – SDS, stay tuned.

Review images