Redis source code analysis - on advanced data structures

Advanced data structure

Base Object (redisObject)

The data structure

#define LRU_BITS 24

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or * LFU data (least significant 8 bits frequency * and most significant 16 bits access time). */
    int refcount;
    void *ptr;
    
} robj;
Copy the code

Basic field details

In the redisObject data structure, the bit field in C is used to save memory, and the bit field operation is very convenient.

It is important to be aware of the portability, such as in some embedded systems where 1 byte is not 8 bits, and if your type is two bytes across, this can lead to unexpected consequences, so proper byte padding is necessary. Of course, don’t be intimidated by these two problems, because bitfields are much easier to manipulate than bitwise operations.

Time calculation: The function to obtain time belongs to the system call, which consumes resources. Therefore, Redis adopts the way of caching and periodically updates the time in the scheduled task as shown in the serverCron function.


int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
     //....
    /* Update the time cache. */
    updateCachedTime();
    //....
}
Copy the code

Note that atomicGet is used in Lruclock because it may be used in other threads, such as cluster states.

string

The data structure

There are three types of strings in Redis. First let’s review the object types

char *strEncoding(int encoding) {
    switch(encoding) {
    case OBJ_ENCODING_RAW: return "raw";
    case OBJ_ENCODING_INT: return "int";
    case OBJ_ENCODING_HT: return "hashtable";
    case OBJ_ENCODING_QUICKLIST: return "quicklist";
    case OBJ_ENCODING_ZIPLIST: return "ziplist";
    case OBJ_ENCODING_INTSET: return "intset";
    case OBJ_ENCODING_SKIPLIST: return "skiplist";
    case OBJ_ENCODING_EMBSTR: return "embstr";
    default: return "unknown"; }}Copy the code

Visible strings are divided into “raw” and “embstr”, and there is also an int.


/* Try to encode a string object in order to save space */
robj *tryObjectEncoding(robj *o) {
    long value;
    sds s = o->ptr;
    size_t len;

    /* Make sure this is a string object, the only type we encode * in this function. Other types use encoded memory efficient * representations but are handled by the commands implementing * the type. */
    serverAssertWithInfo(NULL,o,o->type == OBJ_STRING);

    /* We try some specialized encoding only for objects that are * RAW or EMBSTR encoded, in other words objects that are still * in represented by an actually array of chars. */
    if(! sdsEncodedObject(o))return o;

    /* It's not safe to encode shared objects: shared objects can be shared * everywhere in the "object space" of Redis and may end in places where * they are not handled. We handle them only as values in the keyspace. */
     if (o->refcount > 1) return o;

    /* Check if we can represent this string as a long integer. * Note that we are sure that a string larger than 20 chars is not * representable as a 32 nor 64 bit integer. */
    len = sdslen(s);
    if (len <= 20 && string2l(s,len,&value)) {
        /* This object is encodable as a long. Try to use a shared object. * Note that we avoid using shared integers when maxmemory is used * because every object needs to have a private LRU field for the LRU * algorithm to work well. */
        if ((server.maxmemory == 0| |! (server.maxmemory_policy & MAXMEMORY_FLAG_NO_SHARED_INTEGERS)) && value >=0 &&
            value < OBJ_SHARED_INTEGERS)
        {
            decrRefCount(o);
            incrRefCount(shared.integers[value]);
            return shared.integers[value];
        } else {
            if (o->encoding == OBJ_ENCODING_RAW) sdsfree(o->ptr);
            o->encoding = OBJ_ENCODING_INT;
            o->ptr = (void*) value;
            returno; }}/* If the string is small and is still RAW encoded, * try the EMBSTR encoding which is more efficient. * In this representation the object and the SDS string are allocated * in the same chunk of memory to save space and cache misses. */
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT) {
        robj *emb;

        if (o->encoding == OBJ_ENCODING_EMBSTR) return o;
        emb = createEmbeddedStringObject(s,sdslen(s));
        decrRefCount(o);
        return emb;
    }

    /* We can't encode the object... * * Do the last try, and at least optimize the SDS string inside * the string object to require little space, in case there * is more than 10% of free space at the end of the SDS string. * * We do that only for relatively large strings as this branch * is only entered if the length of the string is greater than * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. * /
    if (o->encoding == OBJ_ENCODING_RAW &&
        sdsavail(s) > len/10)
    {
        o->ptr = sdsRemoveFreeSpace(o->ptr);
    }

    /* Return the original object. */
    return o;
}
Copy the code

You can view it using commands

Embstr and RAW analysis

type	The characteristics of	advantages	disadvantages
embstr	1. Memory space is allocated only once, so ROBj and SDS are continuous; 2. Read only; 3. When the Embstr string needs to be modified, it is converted to RAW and remains raw thereafter	1. Create and delete only once. 2. Search fast	1. Reallocation involves robj and the entire SDS object, so embstr is read-only
raw	1. Robj and SDS are discontinuous; 2. Can be modified

#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

Copy the code

Applicable scenario
- Cache function: mysql storage, Redis cache
- Counter: like times, video playback times
- Traffic limiting: See Distributed traffic limiting based on Redis

    public static final String DISTRIBUTE_LIMIT_SCRIPT = "local key = KEYS[1] \n"
            + "local limit = tonumber(ARGV[1]) \n"
            + "local current = tonumber(redis.call('get',key) or '0') \n"
            + " \n"
            + "if (limit < current + 1) then \n"
            + " return 0 \n"
            + "else \n"
            + " redis.call('incrby',key,'1') \n"
            + " redis.call('expire',key,'2') \n"
            + " return 1 \n"
            + "end \n";


 /* * 1 indicates pass, and 0 indicates traffic limiting * */
    private static boolean tryAcquire(ResourceLimiterCfg apiRateLimiter,Cluster redisClient){
        Long distributeLimitResult=1L;
        try{ String key= RedisCachePrefix.DISTRIBUTE_LIMIT_PRE+ (StringUtils.isNotEmpty(apiRateLimiter.getPrefix())? apiRateLimiter.getPrefix()+SplitCharacter.SPLIT_COLON.key:"")+apiRateLimiter.getResource()+SplitCharacter.SPLIT_COLON.key+System.currentTimeMillis()/1000;
            if(distributeLimit==null){
                distributeLimit = redisClient.scriptLoad(RedisLuaScript.DISTRIBUTE_LIMIT_SCRIPT);
            }

            distributeLimitResult = (Long) redisClient.evalsha(distributeLimit, Collections.singletonList(key),
                    Collections.singletonList(apiRateLimiter.getPerSecCount().toString()), true, ScriptOutputType.INTEGER);

            log.info("tryAcquire key={},distributeLimitResult={} ",key,distributeLimitResult  );

        }catch (Exception e){
            log.error("tryAcquire={}",e);
            e.printStackTrace();
        }finally {
             return distributeLimitResult.intValue()==1; }}Copy the code

Pay attention to the point

Redis stores three types of strings: integers (shared int and unshared), embstr, and raw;
Redis specifies that the string length cannot be greater than 512 MB


static int checkStringLength(client *c, long long size) {
    if (size > 512*1024*1024) {
        addReplyError(c,"string exceeds maximum allowed size (512MB)");
        return C_ERR;
    }
    return C_OK;
}
Copy the code

The list of

The data structure

The data structure

Quicklist = quickList; quickList = quickList; quickList = quickList;

void listTypePush(robj *subject, robj *value, int where) {
    if (subject->encoding == OBJ_ENCODING_QUICKLIST) {
        int pos = (where == LIST_HEAD) ? QUICKLIST_HEAD : QUICKLIST_TAIL;
        value = getDecodedObject(value);
        size_t len = sdslen(value->ptr);
        quicklistPush(subject->ptr, value->ptr, len, pos);
        decrRefCount(value);
    } else {
        serverPanic("Unknown list encoding"); }}Copy the code


/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist. * We use bit fields keep the quicklistNode at  32 bytes. * count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k). * encoding: 2 bits, RAW=1, LZF=2. * container: 2 bits, NONE=1, ZIPLIST=2. * recompress: 1 bit, bool, true if node is temporarry decompressed for usage. * attempted_compress: 1 bit, boolean, used for verifying during testing. * extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    // The former node of the bidirectional list
    struct quicklistNode *prev;
    // The last node of the bidirectional list
    struct quicklistNode *next;
    /// points to a ziplist structure when recompress is not set
    // Set the compressed data parameter recompress to the quicklistLZF structure
    unsigned char *zl;
    // Compress the total length of the list ziplist
    unsigned int sz;             /* ziplist size in bytes */
    // Total number of entries in each ziplist
    unsigned int count : 16;     /* count of items in ziplist */
    // Indicates whether the LZF compression algorithm is used to compress the QuickList node. 1 indicates that the quickList node is compressed, and 2 indicates that the Quicklist node is not compressed
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    // Indicates whether ziplist is enabled for storage
    unsigned int container : 2;  /* NONE==1 or ZIPLIST==2 */
    // Record whether the point has been compressed before
    unsigned int recompress : 1; /* was this node previous compressed? * /
    // Tests are used
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    // additional extension bits, 10bits in length
    unsigned int extra : 10; /* more bits to steal for future usage */
} quicklistNode;

/* When LZF compression algorithm is specified to compress ziplist entry nodes, QuicklistLZF is a 4+N byte struct holding 'sz' followed by 'sz' is  byte length of 'compressed' field. * 'compressed' is LZF data with total (compressed) length 'sz' *NOTE: uncompressed length is stored in quicklistNode->sz.
 * When quicklistNode->zl is compressed, node->zl points to a quicklistLZF */
typedef struct quicklistLZF {
    //// indicates the size of ziplist compressed by LZF algorithm
    unsigned int sz; /* LZF size in bytes*/
    // Compressed data, flexible array
    char compressed[];
} quicklistLZF;

/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist. * 'count' is the number of total entries. *  'len' is the number of quicklist nodes. * 'compress' is: -1 if compression disabled, otherwise it's the number * of quicklistNodes to leave uncompressed at ends of quicklist. * 'fill' is the user-requested  (or default) fill factor. */
typedef struct quicklist {
    // List header
    quicklistNode *head;
    // The end of the list
    quicklistNode *tail;
    // The number of entries in all QuickListNodes
    unsigned long count;        /* total count of all entries in all ziplists */
    // Number of QuickListNodes, which is the length of the QuickList
    unsigned long len;          /* number of quicklistNodes */
    // The size of the ziplist
    int fill : 16;              /* fill factor for individual nodes */
    // save the compression degree value, which is set in the configuration file. The length is 16bits. 0 indicates no compression
    unsigned int compress : 16; /* depth of end nodes not to compress; 0=off */
} quicklist;

// quicklistNode entry structure in QuickList
typedef struct quicklistEntry {
    // Points to the owning QuickList pointer
    const quicklist *quicklist;
    // A pointer to the owning quicklistNode
    quicklistNode *node;
    // Pointer to the current Ziplist structure
    unsigned char *zi;
    // The value member of the string pointing to the current ziplist structure
    unsigned char *value;
    // Points to the value member of the current Ziplist structure
    long long longval;
    // The number of bytes in the current Ziplist structure
    unsigned int sz;
    // Save the offset relative to ziplist
    int offset;
} quicklistEntry;

Copy the code

Why does this data structure exist?
- Bidirectional linked lists are efficient to insert and delete, but are not memory friendly, and require access to the two extra Pointers
- Continuous memory, represented by arrays, has high insertion and deletion time complexity, but is memory friendly (local principle), so the combination of the two gives rise to quicklist data structure. In fact, in C++ STL deque design pattern is also the same.

Consider: How to set the ratio between the length of the list and the representation of the array (ziplist)? Let’s consider the extreme case. If the list length is 1, then it degenerates into a Ziplist, which may result in OOM because it can’t find a large contiguous memory. If ziplist is 1, then it degenerates into a bidirectional linked list, which is very memory unfriendly, and can cause a lot of memory fragmentation, affecting performance, right? So what to do? And see the specific analysis later.

Specific field analysis

Fill: 16bit to control the size of the ziplist and store itlist-max-ziplist-sizeThe value of the parameter

list-max-ziplist-size -2
Copy the code

As you can see, the positive value represents the number of Ziplist entries in each Quicklist.


// Add a new node to the original node
REDIS_STATIC int _quicklistNodeAllowInsert(const quicklistNode *node,
                                           const int fill, const size_t sz) {
     / /...
    // Here, count represents the total number of entries
    else if ((int)node->count < fill)
        return 1;
     //....
}

Copy the code

Compress: 16 bits. This parameter is used to compress a nodelist-compress-depthThe value of the parameter

Because when there is a lot of data, the data that is most easily accessed is likely to be at both ends, and the data in the middle is accessed infrequently (and with poor performance). If this is the case, the list also provides an option to compress the data nodes in the middle, further saving memory. See the configuration parameter list-compressed-depth.

						list-compress-depth 0
Copy the code

Note that it is node level, not entry level, but when node is compressed, the entire ziplist is compressed.

0: this is a special value, indicating that none is compressed. This is the default value for Redis. 1: indicates that one node at both ends of the QuickList is not compressed, and the middle node is compressed. 2: indicates that two nodes at both ends of the QuickList are not compressed, and the middle node is compressed. 3: indicates that three nodes at both ends of the QuickList are not compressed, and the middle node is compressed. And so on...Copy the code

Therefore, the head and tail nodes are not compressed because they are often used. For compression this is LZF(see for yourself).

recompress

After lindex is executed, the node is decompressed temporarily. Recompress indicates that the node will be compressed at a later time.

More important operations

Insertion is divided into head and tail

void quicklistPush(quicklist *quicklist, void *value, const size_t sz,
                   int where) {
    if (where == QUICKLIST_HEAD) {
        quicklistPushHead(quicklist, value, sz);
    } else if(where == QUICKLIST_TAIL) { quicklistPushTail(quicklist, value, sz); }}/* Add new entry to head node of quicklist. * * Returns 0 if used existing head. * Returns 1 if new head created. */
int quicklistPushHead(quicklist *quicklist, void *value, size_t sz) {
    quicklistNode *orig_head = quicklist->head;
    if (likely(
            _quicklistNodeAllowInsert(quicklist->head, quicklist->fill, sz))) {
        quicklist->head->zl =
            ziplistPush(quicklist->head->zl, value, sz, ZIPLIST_HEAD);
        quicklistNodeUpdateSz(quicklist->head);
    } else {
        quicklistNode *node = quicklistCreateNode();
        node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_HEAD);

        quicklistNodeUpdateSz(node);
        
        _quicklistInsertNodeBefore(quicklist, quicklist->head, node);
    }
    quicklist->count++;
    quicklist->head->count++;
    return(orig_head ! = quicklist->head); }/* Add new entry to tail node of quicklist. * * Returns 0 if used existing tail. * Returns 1 if new tail created. */
int quicklistPushTail(quicklist *quicklist, void *value, size_t sz) {
    quicklistNode *orig_tail = quicklist->tail;
    if (likely(
            _quicklistNodeAllowInsert(quicklist->tail, quicklist->fill, sz))) {
        quicklist->tail->zl =
            ziplistPush(quicklist->tail->zl, value, sz, ZIPLIST_TAIL);
        quicklistNodeUpdateSz(quicklist->tail);
    } else {
        quicklistNode *node = quicklistCreateNode();
        node->zl = ziplistPush(ziplistNew(), value, sz, ZIPLIST_TAIL);

        quicklistNodeUpdateSz(node);
        _quicklistInsertNodeAfter(quicklist, quicklist->tail, node);
    }
    quicklist->count++;
    quicklist->tail->count++;
    return(orig_tail ! = quicklist->tail); } REDIS_STATICvoid __quicklistInsertNode(quicklist *quicklist,
                                        quicklistNode *old_node,
                                        quicklistNode *new_node, int after) {
    if (after) {
        new_node->prev = old_node;
        if (old_node) {
            new_node->next = old_node->next;
            if (old_node->next)
                old_node->next->prev = new_node;
            old_node->next = new_node;
        }
        if (quicklist->tail == old_node)
            quicklist->tail = new_node;
    } else {
        new_node->next = old_node;
        if (old_node) {
            new_node->prev = old_node->prev;
            if (old_node->prev)
                old_node->prev->next = new_node;
            old_node->prev = new_node;
        }
        if (quicklist->head == old_node)
            quicklist->head = new_node;
    }
    /* If this insert creates the only element so far, initialize head/tail. */
    if (quicklist->len == 0) {
        quicklist->head = quicklist->tail = new_node;
    }

    if (old_node)
        quicklistCompress(quicklist, old_node);

    quicklist->len++;
}
Copy the code

In __quicklistInsertNode, it can be a bit more complicated because of the subscript insertion support, and you need to consider whether compression is needed.

_quicklistNodeAllowInsert Checks whether the server can be directly inserted

/* Optimization levels for size-based filling */
static const size_t optimization_level[] = {4096.8192.16384.32768.65536};

The sz argument represents the total number of bytes inserted into the object
REDIS_STATIC int _quicklistNodeAllowInsert(const quicklistNode *node,
                                           const int fill, const size_t sz) {
    if(unlikely(! node))return 0;

    int ziplist_overhead;
    /* size of previous offset Stores information about the previous object */
    if (sz < 254)
        ziplist_overhead = 1;
    else
        ziplist_overhead = 5;

    /* size of forward offset Specifies the length of the current object */
    if (sz < 64)
        ziplist_overhead += 1;
    else if (likely(sz < 16384))
        ziplist_overhead += 2;
    else
        ziplist_overhead += 5;
    // If the value is an integer, it exceeds the value. Do not consider cascading updates
    /* new_sz overestimates if 'sz' encodes to an integer type */
    // This is the maximum value
    unsigned int new_sz = node->sz + sz + ziplist_overhead;
    if (likely(_quicklistNodeSizeMeetsOptimizationRequirement(new_sz, fill)))
        return 1;
        // When fill is positive, a single node cannot exceed 8KB
    else if(! sizeMeetsSafetyLimit(new_sz))return 0;
    // Integer judgment
    else if ((int)node->count < fill)
        return 1;
    else
        return 0;
}


REDIS_STATIC int
_quicklistNodeSizeMeetsOptimizationRequirement(const size_t sz,
                                               const int fill) {
    if (fill >= 0)
        return 0;

    size_t offset = (-fill) - 1;
    // Determine if the boundary is crossed
    if (offset < (sizeof(optimization_level) / sizeof(*optimization_level))) {
        // If Max bytes are set at this offset, return 1
        if (sz <= optimization_level[offset]) {
            return 1;
        } else {
            return 0; }}else {
        return 0; }}Copy the code

Node compression

REDIS_STATIC int __quicklistCompressNode(quicklistNode *node) { // Compress the node
#ifdef REDIS_TEST
    node->attempted_compress = 1;
#endif

    /* Don't bother compressing small values */
    if (node->sz < MIN_COMPRESS_BYTES)  < 48 bytes does not compress
        return 0;

    quicklistLZF *lzf = zmalloc(sizeof(*lzf) + node->sz);

    /* Cancel if compression fails or doesn't compress small enough */
    if (((lzf->sz = lzf_compress(node->zl, node->sz, lzf->compressed,
                                 node->sz)) == 0) ||
        lzf->sz + MIN_COMPRESS_IMPROVE >= node->sz) {   // If compression fails or the space saved after compression is less than 8 bytes, compression is abandoned
        /* lzf_compress aborts/rejects compression if value not compressable. */
        zfree(lzf);
        return 0;
    }
    lzf = zrealloc(lzf, sizeof(*lzf) + lzf->sz);    // Reallocate memory
    zfree(node->zl);    // Release the original node
    node->zl = (unsigned char *)lzf;    // Assign the compression node to node
    node->encoding = QUICKLIST_NODE_ENCODING_LZF;   // Record the encoding
    node->recompress = 0;
    return 1;
}
Copy the code

List some features

Blocking and non-blocking

Some operations on List support blocking and non-blocking. Focus on blocking — when a given key does not exist, or contains an empty list of keys, BLPOP or BLPOP will block the connection until another client performs a [LR]PUSH on these keys to put a new data in the list of any key. This command will unblock the client that calls BLPOP or BLPOP.

In fact, the implementation is very simple, determine whether the timeout, or there is data, otherwise do not return the line.

BLPOP key [key ...]  timeout BRPOP key [key ...]  timeout BRPOPLPUSHsource destination timeout

Copy the code

Usage scenarios
- Common data structures
  - Lpush + lpop = Stack (Stack)
  - Lpush +rpop=Queue
  - Lpush + LTRIM =Capped Collection
  - Lpush + brPOP =Message Queue
- The article lists
  
  You can use composition to store articles with hashes and list articles with lists

Personal public account

Please follow and contribute to the public account coderFan.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Redis source code analysis – on advanced data structures

Advanced data structure

Base Object (redisObject)

string

The list of

The data structure

Personal public account

Redis source code analysis – on advanced data structures

Advanced data structure

Base Object (redisObject)

string

The list of

The data structure

Personal public account

Related Posts

Ctrip ticket big data architecture best practices

How do programmers choose the first company

Hand in Hand, Hand out Java – Web Robot I (Initial)