Redis can be thought of as a dictionary that stores mappings between keys and values. Regardless of the data type, its value is eventually stored in the large dictionary, and its key is used as an index.
Redis contains multiple databases (the maximum number can be changed through configuration files), and each database has a redisDb structure at the bottom. The Redis client stores data into Redis by interacting with the redisDb corresponding to the database instance.
typedef struct redisDb {
dict *dict; /* The keyspace for this DB */
dict *expires; /* Timeout of keys with a timeout set */
dict *blocking_keys; /* Keys with clients waiting for data (BLPOP)*/
dict *ready_keys; /* Blocked keys that received a PUSH */
dict *watched_keys; /* WATCHED keys for MULTI/EXEC CAS */
int id; /* Database ID */
long long avg_ttl; /* Average TTL, just for stats */
unsigned long expires_cursor; /* Cursor of the active expire cycle. */
list *defrag_later; /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;
Copy the code
RedisDb has a dict *dict, which stores data in the database. Dict is implemented using a hashtable (in Redis, hashtable is a data type). But the hashtable in question is an exponential data structure, or dict in Redis. The hashtables mentioned below, unless otherwise specified, are all hashtables in data structures with values).
Dict data structure
- dictht & dictEntry
In Redis, Hashtable is specifically implemented by Dictht. Where, the table structure is an array of type dictEntry, and dictEntry is a node in a one-way linked list that represents a record in hashTable. Contains the key, value of the record, and a pointer to the next node (the next record) (hash collision).
typedef struct dictht {
dictEntry **table; /* List array */
unsigned long size; /* The number of buckets */
unsigned long sizemask;
unsigned long used; /* Number of stored elements */
} dictht;
typedef struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next;
} dictEntry;
Copy the code
Note here that both key and val in dictEntry are void types, which means that any data type can be stored. So even though the underlying implementation of some data types in Redis themselves is hashTable (set, zset, Hash, etc.), they can still be stored in dict in redisDb.
Size in dictht represents the number of buckets in hashtable, and sizemask is the bitmask of size, which is used to calculate the bucket in which the corresponding key should be stored (bit operation is more efficient than module operation). Used is used to store the number of records (elements) actually stored in the current HashTable.
- dict & dictType
Since Redis is a single-process single-thread model, ht in the dict structure is defined as an array to maintain two dicthTs in order to avoid blocking services during hash expansion and rehash. Rehashidx is used as the index of the bucket in HT [0] during the rehash process. Pauserehash indicates whether rehash is paused.
typedef struct dict {
dictType *type;
void *privdata;
dictht ht[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;
Copy the code
DictType defines some functions that jointly determine the behavior of key and value in the dict. In practical application, Redis also makes some constraints on dictType according to different data types and application scenarios. For example, for set data type, dict has only key but no value.
typedef struct dictType {
uint64_t (*hashFunction)(const void *key);
void *(*keyDup)(void *privdata, const void *key);
void *(*valDup)(void *privdata, const void *obj);
int (*keyCompare)(void *privdata, const void *key1, const void *key2);
void (*keyDestructor)(void *privdata, void *key);
void (*valDestructor)(void *privdata, void *obj);
int (*expandAllowed)(size_t moreMem, double usedRatio);
} dictType;
extern dictType objectKeyPointerValueDictType;
extern dictType objectKeyHeapPointerValueDictType;
extern dictType setDictType;
extern dictType zsetDictType;
extern dictType clusterNodesDictType;
extern dictType clusterNodesBlackListDictType;
extern dictType dbDictType;
extern dictType shaScriptObjectDictType;
extern dictType hashDictType;
extern dictType replScriptCacheDictType;
extern dictType dbExpiresDictType;
extern dictType modulesDictType;
extern dictType sdsReplyDictType;
dictType setDictType = {
dictSdsHash, /* hash function */
NULL./* key dup */
NULL./* val dup */
dictSdsKeyCompare, /* key compare */
dictSdsDestructor, /* key destructor */
NULL /* val destructor */
};
Copy the code
⒈ Dict creation
Dict creation initializes only data structures, but not buckets in a Hashtable.
// Allocate memory space for dict
dict *dictCreate(dictType *type, void *privDataPtr)
{
dict *d = zmalloc(sizeof(*d));
_dictInit(d,type,privDataPtr);
return d;
}
// Initialize the dict
int _dictInit(dict *d, dictType *type, void *privDataPtr)
{
_dictReset(&d->ht[0]);
_dictReset(&d->ht[1]);
d->type = type;
d->privdata = privDataPtr;
d->rehashidx = - 1;
d->pauserehash = 0;
return DICT_OK;
}
// Initialize dictht
static void _dictReset(dictht *ht)
{
ht->table = NULL;
ht->size = 0;
ht->sizemask = 0;
ht->used = 0;
}
Copy the code
⒉ New element
When adding a dict element, you first need to allocate a dictEntry to the new element in dictht’s table, and then populate the newly allocated dictEntry with data.
When new elements are added, dictEntry is assigned to dict.ht[1] if dict is rehashing, and to dict.ht[0] otherwise.
Additionally, for a newly initialized dict, a bucket is initialized along with the first new element.
// Add elements to dict
int dictAdd(dict *d, void *key, void *val)
{
dictEntry *entry = dictAddRaw(d,key,NULL);
if(! entry)return DICT_ERR;
dictSetVal(d, entry, val);
return DICT_OK;
}
DictEntry / / distribution
dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
long index;
dictEntry *entry;
dictht *ht;
if (dictIsRehashing(d)) _dictRehashStep(d);
/* Get the index of the new element, or -1 if * the element already exists. */
if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == - 1)
return NULL;
/* Allocate the memory and store the new entry. * Insert the element in top, with the assumption that in a database * system it is more likely that recently added entries are accessed * more frequently. */
ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
entry = zmalloc(sizeof(*entry));
entry->next = ht->table[index];
ht->table[index] = entry;
ht->used++;
/* Set the hash entry fields. */
dictSetKey(d, entry, key);
return entry;
}
Copy the code
In the process of allocating dictEntry, the function dictHashKey is first used to calculate the hash value of the key of the new element. The function dictkeyIndex is then used to find the index position of the bucket to which the new element should be assigned. During this process, the length of dict.ht[0].table (the number of buckets in the Hashtable) is initialized to 4 if it is a newly created dict. If the current dict is rehashing, the new dictEntry is assigned to dict.ht[1].table.
// Find the index of the bucket to which dictEntry should be assigned
static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing)
{
unsigned long idx, table;
dictEntry *he;
if (existing) *existing = NULL;
/* Expand the hashtable if necessary */
if (_dictExpandIfNeeded(d) == DICT_ERR)
return - 1;
for (table = 0; table <= 1; table++) {
idx = hash & d->ht[table].sizemask;
/* Search if this slot does not already contain the given key */
he = d->ht[table].table[idx];
while(he) {
if (key==he->key || dictCompareKeys(d, key, he->key)) {
if (existing) *existing = he;
return - 1;
}
he = he->next;
}
// If the dict is not rehashed, it is allocated to HT [0]
if(! dictIsRehashing(d))break;
}
return idx;
}
// Expand hashTable if necessary
#define DICT_HT_INITIAL_SIZE 4
static int dict_can_resize = 1;
static unsigned int dict_force_resize_ratio = 5;
static int _dictExpandIfNeeded(dict *d)
{
/* Rehash is being performed, returns */
if (dictIsRehashing(d)) return DICT_OK;
/* If hashtable is empty, initialize its size, which defaults to 4 */
if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
/* select * from hashtable; * The number of elements in the Hashtable and the number of buckets is 1:1 and can be expanded * or the ratio of the number of elements in the Hashtable to the number of buckets exceeds the threshold and the current memory space is sufficient for expansion */
if (d->ht[0].used >= d->ht[0].size &&
(dict_can_resize ||
d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) &&
dictTypeExpandAllowed(d))
{
return dictExpand(d, d->ht[0].used + 1);
}
return DICT_OK;
}
Copy the code
Once the index location of the bucket is found, memory space is allocated for dictEntry, then dictEntry is added to the head of the linked list in the bucket, and finally data is populated into the newly allocated dictEntry.
In the running of function dictAddRaw(), if the given key already exists in the hashtable, the dictEntry corresponding to the given key will be found and assigned to existing. Existing is a pointer type, and an update to an existing V is an update to an element in a Hashtable. The update operation of hashtable in Redis is such an implementation mechanism, but after the update is completed, the memory space occupied by the old V needs to be freed.
⒊ Delete elements
There are two ways to achieve element deletion in hashtable: dictDelete() and dictUnlink(). The difference between them lies in that dictDelete() removes dictEntry from hashtable and also releases the memory space occupied by it, while dictUnlink() only removes dictEntry from hashtable and then returns it. If the memory space occupied by dictFreeUnlinkedEntry needs to be released, an additional call is required to dictFreeUnlinkedEntry().
If you need to do something to an element before you can remove it completely, you should select dictUnlink() to remove the element. DictUnlink () will return dictEntry after removing the element from hashtable. The caller can perform the expected operation on dictEntry and then call dictFreeUnlinkedEntry() to release the memory space occupied by dictEntry. If dictDelete() is used, the caller needs to search hashtable first to find the desired dictEntry, and then perform the desired operation on it, and then call dictDelete() to perform the deletion operation. In summary, the first scheme has one less hashtable lookup operation than the second scheme.
static dictEntry *dictGenericDelete(dict *d, const void *key, int nofree) {
uint64_t h, idx;
dictEntry *he, *prevHe;
int table;
if (d->ht[0].used == 0 && d->ht[1].used == 0) return NULL;
if (dictIsRehashing(d)) _dictRehashStep(d);
h = dictHashKey(d, key);
for (table = 0; table <= 1; table++) {
idx = h & d->ht[table].sizemask;
he = d->ht[table].table[idx];
prevHe = NULL;
while(he) {
if (key==he->key || dictCompareKeys(d, key, he->key)) {
/* Unlink the element from the list */
if (prevHe)
prevHe->next = he->next;
else
d->ht[table].table[idx] = he->next;
if(! nofree) { dictFreeKey(d, he); dictFreeVal(d, he); zfree(he); } d->ht[table].used--;return he;
}
prevHe = he;
he = he->next;
}
if(! dictIsRehashing(d))break;
}
return NULL; /* not found */
}
Copy the code
DictDelete () and dictUnlink() both realize the deletion function by calling dictGenericDelete(). The difference is that the nofree parameter is 0 and the nofree parameter is 1. When nofree is 0, removing dictEntry from the HashTable frees the memory it occupies.
4. Capacity
As mentioned above, given sufficient memory space, If the number of elements in dictht and the number of buckets are 1:1 and the current dict can be expanded (dict_can_resize = 1) or the number of elements in Dictht and the number of buckets If the ratio of the number exceeds the threshold (dict_force_resize_ratio = 5), the dict needs to be expanded.
So how does Redis determine whether the current memory space can meet the demand for expansion? The maximum memory allocated to Redis is specified in the Redis configuration file. If this value is not set, the maximum memory allocated to Redis is unlimited by default. If the ratio of the number of elements in the hashtable to the number of buckets does not reach the maximum load factor set by Redis, Redis will judge whether the maximum memory space allocated by the system minus the current allocated space can meet the space needed after expansion. If not, Redis will determine if the allocated but unused space is sufficient. If not, the expansion request will not be allowed.
Based on the above expansion condition judgment logic, if a large amount of memory space is required for expansion, Redis service will be blocked before the memory space required for expansion is allocated. In addition, if the available memory space of Redis is no longer sufficient for the expansion, Redis will also clear some data so that the total memory consumption does not exceed the upper limit set in the configuration file.
/* Determine whether the memory space is sufficient to expand the logical */
static int dictTypeExpandAllowed(dict *d) {
if (d->type->expandAllowed == NULL) return 1;
return d->type->expandAllowed(
_dictNextPower(d->ht[0].used + 1) * sizeof(dictEntry*),
(double)d->ht[0].used / d->ht[0].size);
}
#define HASHTABLE_MAX_LOAD_FACTOR 1.618
/* If the ratio of the current number of elements to the number of buckets does not exceed the maximum load factor. Determine whether the available memory space is sufficient for expansion * If the crash factor is exceeded, expansion is allowed anyway */
int dictExpandAllowed(size_t moreMem, double usedRatio) {
if (usedRatio <= HASHTABLE_MAX_LOAD_FACTOR) {
return! overMaxmemoryAfterAlloc(moreMem); }else {
return 1; }}/* Determine whether the available memory space is sufficient to expand */
int overMaxmemoryAfterAlloc(size_t moremem) {
if(! server.maxmemory)return 0; /* No limit. */
/* Check quickly. */
size_t mem_used = zmalloc_used_memory();
if (mem_used + moremem <= server.maxmemory) return 0;
size_t overhead = freeMemoryGetNotCountedMemory();
mem_used = (mem_used > overhead) ? mem_used - overhead : 0;
return mem_used + moremem > server.maxmemory;
}
Copy the code
Because Redis uses a single-process, single-thread model, it is necessary to avoid blocking the Redis service during capacity expansion, so dict.ht is designed to be an array of length 2. The interaction between the Redis service and the client is conducted in dict.ht[0], while the expansion operation is conducted in dict.ht[1], thus avoiding the possible service congestion caused by the expansion operation. After capacity expansion, elements stored in dict.ht[0] will be rehash and moved to dict.ht[1]. After rehash is completed, the value of dict.ht[1] will be assigned to dict.ht[0]. Dict.ht [1] is then reset for subsequent capacity expansion.
Also to avoid blocking the Redis service, the rehash process is not performed at once, but instead moves (read/write) at most one bucket per dict operation (n =1). In addition, due to the characteristics of hash, empty buckets will inevitably appear. In order to respond to the client as quickly as possible, if 10 x N empty buckets are encountered during rehash, the rehash operation will be terminated. During rehash, newly added elements are added directly to HT [1].
int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
if (malloc_failed) *malloc_failed = 0;
/* The expanded size cannot be smaller than the number of current elements */
if (dictIsRehashing(d) || d->ht[0].used > size)
return DICT_ERR;
dictht n; /* the new hash table */
/* The number of buckets must be 2 to the power of n */
unsigned long realsize = _dictNextPower(size);
/* Prevent overflow */
if (realsize < size || realsize * sizeof(dictEntry*) < realsize)
return DICT_ERR;
/* If the expanded size is the same as the current size, the rehash is meaningless */
if (realsize == d->ht[0].size) return DICT_ERR;
n.size = realsize;
n.sizemask = realsize- 1;
if (malloc_failed) {
n.table = ztrycalloc(realsize*sizeof(dictEntry*));
*malloc_failed = n.table == NULL;
if (*malloc_failed)
return DICT_ERR;
} else
n.table = zcalloc(realsize*sizeof(dictEntry*));
n.used = 0;
/* If it is a newly created dict, it is directly assigned to ht[0] */
if (d->ht[0].table == NULL) {
d->ht[0] = n;
return DICT_OK;
}
/* Otherwise assign ht[1] to prepare rehash */
d->ht[1] = n;
d->rehashidx = 0;
return DICT_OK;
}
// Calculate the number of buckets after expansion
static unsigned long _dictNextPower(unsigned long size)
{
unsigned long i = DICT_HT_INITIAL_SIZE;
if (size >= LONG_MAX) return LONG_MAX + 1LU;
while(1) {
if (i >= size)
return i;
i *= 2; }}int dictRehash(dict *d, int n) {
int empty_visits = n*10; /* The maximum number of empty buckets encountered */
if(! dictIsRehashing(d))return 0;
while(n-- && d->ht[0].used ! =0) {
dictEntry *de, *nextde;
assert(d->ht[0].size > (unsigned long)d->rehashidx);
while(d->ht[0].table[d->rehashidx] == NULL) {
d->rehashidx++;
/* If the number of empty buckets reached 10 in a row, */ is returned
if (--empty_visits == 0) return 1;
}
de = d->ht[0].table[d->rehashidx];
/* Move the data in the bucket */
while(de) {
uint64_t h;
nextde = de->next;
/* Calculates the index position of the element in HT [1] */
h = dictHashKey(d, de->key) & d->ht[1].sizemask;
de->next = d->ht[1].table[h];
d->ht[1].table[h] = de;
d->ht[0].used--;
d->ht[1].used++;
de = nextde;
}
d->ht[0].table[d->rehashidx] = NULL;
d->rehashidx++;
}
/* If all buckets in HT [0] have been moved to HT [1], assign HT [1] to HT [0] and reset HT [1] */
if (d->ht[0].used == 0) {
zfree(d->ht[0].table);
d->ht[0] = d->ht[1];
_dictReset(&d->ht[1]);
d->rehashidx = - 1;
return 0;
}
/* More to rehash... * /
return 1;
}
Copy the code
5. Background task Cron
The Redis service maintains a background scheduled task through the function databasesCron. The frequency of this function is set by the configuration item Hz in the configuration file. This value sets the number of times the function databasesCron can run per second (default is 10). This background scheduled task is responsible for cleaning up expired keys, reducing the size of the Hashtable, and rehash when the Redis service is idle.
- rehash
As mentioned earlier, to ensure that the Redis service responds to client requests as quickly as possible, at most one bucket is moved per dict operation. However, if the Redis service is idle, rehash can be exempt from this restriction.
When the Redis service is idle, Redis takes 1ms to rehash the dict. During this time, the value of n is set to 100 for each rehash, and after each operation is completed, it checks whether the current time is more than 1 ms from the start time. If so, it stops rehash, otherwise it continues.
int incrementallyRehash(int dbid) {
/* Keys dictionary */
if (dictIsRehashing(server.db[dbid].dict)) {
dictRehashMilliseconds(server.db[dbid].dict,1);
return 1; /* already used our millisecond for this loop... * /
}
/ *... . * /
return 0;
}
int dictRehashMilliseconds(dict *d, int ms) {
if (d->pauserehash > 0) return 0;
long long start = timeInMilliseconds();
int rehashes = 0;
while(dictRehash(d,100)) {
rehashes += 100;
if (timeInMilliseconds()-start > ms) break;
}
return rehashes;
}
Copy the code
- Shrinkage capacity
As mentioned earlier, dict expands when the ratio of the number of dict elements to the number of buckets reaches a certain value. Similarly, Redis service shrinks dict when the ratio of the number of elements to the number of buckets is less than a certain value (10%) and the number of buckets is greater than the minimum allowed by Redis service (4).
void tryResizeHashTables(int dbid) {
if (htNeedsResize(server.db[dbid].dict))
dictResize(server.db[dbid].dict);
/ *... . * /
}
#define HASHTABLE_MIN_FILL 10
int htNeedsResize(dict *dict) {
long long size, used;
size = dictSlots(dict);
used = dictSize(dict);
return (size > DICT_HT_INITIAL_SIZE &&
(used*100/size < HASHTABLE_MIN_FILL));
}
int dictResize(dict *d)
{
unsigned long minimal;
if(! dict_can_resize || dictIsRehashing(d))return DICT_ERR;
minimal = d->ht[0].used;
if (minimal < DICT_HT_INITIAL_SIZE)
minimal = DICT_HT_INITIAL_SIZE;
return dictExpand(d, minimal);
}
Copy the code
The scaling down operation has the same logic as the scaling up operation, except that the scaling down operation reduces the number of buckets to the minimum 2 to the NTH power that can hold all the current dict elements.