Redis source code analysis series of articles

Redis source code series installation and common APIS in Liunx

Why analyze from Redis source code

String low-level implementation – dynamic String SDS

Redis bidirectional linked list article all know

Interviewer: Tell me about Redis Hash base:…… (Interview questions from reading)

Skip list is sure not to understand 😏


Illustration of Redis integer intset upgrade process

preface

Hello, everyone, meet again 😊.

In the past few weeks we have looked at the underlying data structure of Redis, such as dynamic string SDS, bidirectional linked list Adlist, Dict Dict, skip list, integer set intset, if you do not understand the common types or underlying data structure of Redis please see the above portal.

Today, the bottom implementation of zset compression table (in the database volume is small when used), if there is no understanding of zset, see the above portal ha.

The concept of compressed lists is presented

Traditional arrays

Like the underlying data before it, compressed lists are a data storage structure designed by Redis.

They are similar to arrays, where data is stored in a contiguous memory space. But it is also a bit different from arrays, where arrays store characters of different lengths and choose the maximum character length as the memory size of each node.

As shown in the figure below, there are five elements in total, and the length of each element is different. At this time, choose 5 as the maximum memory size of each element. If you choose less than 5, then the first element Hello and the second element world cannot be stored completely and the data will be lost.



Existing problems

As mentioned above, we need to use the maximum string size as the memory size of all elements in the entire array. If only one element is very large, but all the other elements are small, then using large numbers for all elements will result in a waste of memory.

So how can we improve?

Extraction compressed list

Redis introduced the concept of compressed lists, that is, how much memory is used for how large an element is, and everything is practical and waste is not allowed.

As shown in the figure below, the memory size is determined according to the actual storage content of each node, that is, the first node occupies 5 bytes, the second node occupies 5 bytes, the third node occupies 1 byte, the fourth node occupies 4 bytes, and the fifth node occupies 3 bytes.



Another problem is that we do not know the size of each element when traversing, so we cannot accurately calculate the specific location of the next node. The actual storage does not appear as a line in the figure above, and we do not know when the current node ends and when the next node arrives. So add the length attribute in Redis to record the length of the previous node.

In the following figure, if you need to iterate from the beginning, take the number after a node, such as the start address of “hello”, but do not know where the end address is, take the number 5, then you can know that “hello” takes up 5 bytes, and then you can successfully find the start position of the next node “world”.




Compressed list diagram analysis

The entire compression list is illustrated below, so you can take a look at it, and the details will be explained later.





header

The table header consists of four parts: zlbytes of memory, zltail_offset of bytes from the start address of the tail node, zllength of nodes, and zlend.




  • Zlbytes: records the memory bytes occupied by the compressed list.
  • Zltail_offset: indicates the number of bytes between the end node of the compression list and the start address of the compression list(The purpose is to directly locate the tail node and facilitate reverse query).
  • Zllength: records the number of nodes in the compressed list. In the figure above, the number of nodes is 2.
  • Zlend: Holds a constant 255(0xFF) that marks the end of the compressed list.

Data nodes

The data node contains three parts: prev_entry_len, current data type and encoding, and value.



  • Prev_entry_len: Records the length of the precursor node.
  • Encoding: Records the current data type and encoding format.
  • Value: stores specific data.

The concrete implementation of compressed list

Compressing list composition

Redis does not encapsulate a structure to hold the compressed list information like the previous string SDS, dictionary, skip list, etc. Instead, you define a series of macros to manipulate the data.

In other words, the compressed list is a bunch of bytecode that we can’t read. Redis gets the data by locating and calculating between bytes.

// Returns the total bytes of the compressed list#define ZIPLIST_BYTES(zl) (*((uint32_t*)(zl)))// Return the tail_offset variable of the compressed list to get the position of the last node#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))// Returns the number of nodes in the compressed list#define ZIPLIST_LENGTH(zl) (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))// Return the number of bytes in the header of the compressed list // (memory bytes zlbytes, last node address ztail_offset, total number of nodes zllength)#define ZIPLIST_HEADER_SIZE (sizeof(uint32_t)*2+sizeof(uint16_t))// Returns the number of bytes at the end of the compressed list#define ZIPLIST_END_SIZE (sizeof(uint8_t))// Returns the address of the first node of the compressed list#define ZIPLIST_ENTRY_HEAD(zl) ((zl)+ZIPLIST_HEADER_SIZE)// Returns the address of the last node of the compressed list#define ZIPLIST_ENTRY_TAIL(zl) ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))// Returns the address at the end of the compressed list#define ZIPLIST_ENTRY_END(zl) ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)Copy the code

Compressing list node composition

Note that this is not how the data is actually encoded.

Isn’t it funny, defining it and not using it?



Because, this structure storage is a waste of space. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * On the 64-bit machine, 29 bytes were used (5 ints, 4 bytes each, 1 char, 1 char, 8 bytes each, so 5*4+1*1+1*8=29 bytes). This is not what compressed lists are designed for.

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function inorder to operate more easily. */ typedef struct zlentry { unsigned int prevrawlensize; Unsigned int prevrawlen; Unsigned int lensize; Unsigned int len; Unsigned int headerSize; //prevrawlensize+lensize unsigned char encoding; // Unsigned char *p; } zlentry;Copy the code

Therefore, Redis improves the above structure by abstracting and combining three parameters.

Prev_entry_len: Prevrawlensize and prevrawlen sum.

Prev_entry_len is represented by one byte if the previous node is less than 254 bytes.

Prev_entry_len is represented by five bytes if the previous node is 254 bytes or greater. The first byte is the constant oxff, and the last four bits are the actual length of the previous node.

Encoding: Sum of lensize and len. Redis enables lensize and len by setting up a set of macro definitions. (Specific i.e. not expanded)

Value: indicates specific data.

Advantages of compressed lists

Disadvantages of compressed lists

Because compressed tables are stored compact, there is no extra space. This means that inserting a new element requires a function to expand memory. It may be necessary to reallocate the new memory space and copy the previous content to the new address at once.

If there is too much data, reallocating memory and copying data can be costly. Therefore, compressed tables are not suitable for storing large strings, and data elements should not be too many.

Diagram of the chain update process for compressed lists (key)

As mentioned earlier, each node entry has a Prevlen field that stores the length of the previous node entry. Prevlen is stored in one byte if the content is less than 254, and in five bytes if it is greater than 254. The pervlen field of the next entry is updated from 1 to 5 bytes. The pervlen field of the next entry is updated from 1 to 5 bytes. If the entry is originally 253 bytes long, the prevlen field of the subsequent entry must be updated.

If each node entry stores 253 bytes of content, the modification of the first entry will result in cascading updates of all subsequent entries, which consumes resources.

Therefore, cascading updates occur only if there are consecutive 250-253 bytes of nodes.

Step one

For example, if you want to change the size of the second data point to a larger size, when the first compressed table looks like the following (XXXX represents the string), a cascading update will occur.



Step 2

We want to assign four lengths to the prevlen of the third element because the prevlen field of the second element is the size of the previous element.



Step 3

The length of the third element has been increased, so the prevlen field of the fourth element needs to be changed as well.



Step 4

The length of the fourth element has increased, so the prevlen field of the fifth element needs to be changed as well.

Compressed list source code analysis

Create an empty ziplistNew compression table

The main steps are allocating memory space, initializing properties, setting the end tag to a constant, and finally returning the compressed table.

Unsigned int bytes = ZIPLIST_HEADER_SIZE+1; // Unsigned int bytes = ZIPLIST_HEADER_SIZE+1; Unsigned char *zl = zmalloc(bytes); unsigned char *zl = zmalloc(bytes); ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); ZIPLIST_LENGTH(zl) = 0; // Set the module to constant zl[bytes-1] = ZIP_END;return zl;
}Copy the code

Cascading update (emphasis)

unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize;
    size_t offset, noffset, extra;
    unsigned char *np;
    zlentry cur, next;

    //whileLoop, which ends when the last node is reachedwhile(p[0] ! = ZIP_END) {// Save node data in cur zipEntry(p, &cur); // Take the number of bytes of the previous node length encoding, and the number of bytes of the current node length encoding, //rawlen = prev_entry_len + encoding + value rawlen = cur.headersize + cur.len; rawlensize = zipStorePrevEntryLength(NULL,rawlen); // If there is no next node, the loop is brokenif (p[rawlen] == ZIP_END) break; ZipEntry (p+rawlen, &next); // When next prevrawlen equals rawlen, no adjustment is requiredif (next.prevrawlen == rawlen) break; Prevrawlensize if next encodes the length of the previous node next. Prevrawlensize if next encodes the length of the previous node next. Prevrawlensize if next encodes the length of the previous nodeifPrevrawlensize < rawlenSize) {// Record the offset of the current pointer, offset = p-zl; Extra = rawlenSize -next. Prevrawlensize; Zl = ziplistResize(zl,curlen+extra); P = zl+offset; // Next node new address np = p+rawlen; Noffset = np-zl; // Update the tail_offset member of the compressed list, not if the next node is the last nodeif((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) ! = np) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra); Prevrawlensize (np+ prevrawlensize, np+next. Prevrawlensize, curlen-noffset-next. Prevrawlensize -1); Prevrawlensize prevrawlen zipStorePrevEntryLength(NP,rawlen); prevrawlenSize prevrawlen zipStorePrevEntryLength(NP,rawlen); // Update p pointer, move to next node, process next next node p += rawlen; Curlen += extra; }else{// Force existing memory to remain unshrunk if less memory is needed // Waste a little memory but save a lot of move copy and no need to expand later when scaling upif (next.prevrawlensize > rawlensize) {
                zipStorePrevEntryLengthLarge(p+rawlen,rawlen);
            } else {
             
                zipStorePrevEntryLength(p+rawlen,rawlen);
            }

            /* Stop here, as the raw length of "next" has not changed. */
            break; }}return zl;
}Copy the code

conclusion

This article is mainly about Redis zset compression table data type of the underlying implementation, from what is compressed table first, analyses its main component, and through many process image explained how the compression table hierarchy updates, finally combining with the source of compression table is described, such as the creation process, the upgrade process, interspersed with examples and the process diagram.

If you feel that writing is also good, trouble to give a praise 👍, your recognition is the power of my writing!

If you think something is wrong, please feel free to comment.

All right, bye.

Thank you all ❤

1. If this article is helpful to you, please support it with a like. Your like is the motivation for my writing.

2. Follow the public account “learning Java little sister” to add my friends, I pull you into the “Java technology exchange group”, we communicate and progress together.