String low-level implementation - dynamic String SDS

Redis source code analysis series of articles

Redis source code series installation and common APIS in Liunx

Why analyze from Redis source code

preface

Last time we looked at what Redis is, how to install it on Linux, common data types and API usage, if you don’t understand, please go to the home page.

Redis is written in C, and there are no string,list,hash, set, and zset data types in C. So how does C implement these data types? We start from the article, will begin to analyze the source 😁.

Use the API

Str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld = str1: helloWorld

If we set the str2 to helloworldhelloworldhelloworldhelloworldhell, length of 44, then use the debug object + a variable name under way to see, pay attention to the red code for embstr.

But when we set to helloworldhelloworldhelloworldhelloworldhello, length of 45, and then use the debug object + a variable name under way to see, pay attention to the red code for raw.

Finally, set str3 to the integer 100, and use debug Object + variable name. Note that the code marked in red is int.

Therefore, there are three ways to store the Redis string type. When the string length is less than or equal to 44, embstr is used at the bottom. When the string length is greater than 44, raw is used. When the setting is an integer, the underlying is int.

Differences between EMbstr and RAW

The outermost layer of all types of data structures is a RedisObject, which this part will say, just to get a sense of it, because that’s not the focus of this article. If the string is less than or equal to 44, the actual data and RedisObject are adjacent to each other in memory, as shown in the following figure.

If the string is greater than 44, the actual data and the RedisObject are not adjacent in memory, as shown in the following figure.

Again, this is not important, we’ll talk about it in the future, but I’m going to mention it now, just to give you a general idea of what the String type of Redis is. What we’re going to talk about today is the actual data, which is where the pointer points to 😄.

The definition of SDSHdr

In fact, the data is not directly stored, there are also encapsulation, see the following code can be divided into five types, respectively is sdSHDR5, SDSHdr8, sdSHdr16, sdSHdr32, sdSHdr64. The difference between SDSHDR5 and the other four is obvious. Sdshrd5 actually saves more memory space. The other four look similar at first glance, including len of used length, alloc of total length, flags of actual data, buF.

// Define five different structures, sdshdr5,sdshdr8, sdshdr16,sdshdr32,sdshdr64 struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; Char buf[]; // Pointer to actual data}; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; /* Use length */ uint8_t alloc; /* Total length */ unsigned char flags; char buf[]; }; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; uint16_t alloc; unsigned char flags; char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; uint32_t alloc; unsigned char flags; char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; uint64_t alloc; unsigned char flags; char buf[]; };Copy the code

SDS logical diagram

Suppose we set a string to hello, then len of his SDS is 8 and len of his SDS is 6, as shown below. Note: Redis will select the appropriate SDSHDR based on the specific character length, but all types are similar, so the following diagram is simplified.

The advantage of SDS

We can see that this is a rewrap of character arrays, but why, wouldn’t it be easier to just use character arrays? It starts with the fundamental differences between THE C and Java languages.

Faster to get string length

We all know that Java has a length method for strings and a size method for lists, so we can get the size directly. C, however, is more of a low-level implementation, so there is no direct way to use it. This creates a problem. If we want to get the length of an array, we have to start from the beginning. The first ‘\0’ indicates the end of the array. This is too slow to change the array every time you want to get the length. Therefore, the SDS data structure is designed to increase the total length and the used length outside the original character array, so that the used length can be directly obtained each time. The complexity is O(1).

Data is secure and will not be truncated

If the traditional string holds binary files such as pictures and videos, the ‘\0’ may appear in the middle, which will cause data loss if the original logic is followed. So you can use the used length to indicate whether the character array is finished.

SDS critical code analysis

Getting common values (abstracting common methods)

Some common methods are written in SDS.h, such as calculating the length of SDS (len of SDSHDR), calculating the free length of SDS (alloc- len of SDSHDR), calculating the available length of SDS (ALLOc of SDSHDR), and so on. But is there any question, isn’t there a line of code that does this, why abstract the method? So the problem is that in the above, we have divided SDSHDR into five types, namely sdSHDR5, SDSHDR8, SDSHDR16, SDSHDR32, sdSHDR64. So when we actually use it, we want to distinguish the current type and take the corresponding field or set the corresponding field.

Static inline size_t sdslen(const SDS s) {// The flexible array does not take up space, Unsigned char flags = s[-1]; Switch (flags&SDS_TYPE_MASK) {case SDS_TYPE_5://0 
            return SDS_TYPE_5_LEN(flags);
        case SDS_TYPE_8://1
            returnSDS_HDR(8,s)->len; // take len of sdshdr8 abovecase SDS_TYPE_16://2
            return SDS_HDR(16,s)->len;
        case SDS_TYPE_32://3
            return SDS_HDR(32,s)->len;
        case SDS_TYPE_64://5
            return SDS_HDR(64,s)->len;
    }
    return0; } alloc-len static inline size_t sdsavail(const SDS s) {unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5: {
            return 0;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);
            return sh->alloc - sh->len;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            return sh->alloc - sh->len;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            return sh->alloc - sh->len;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            returnsh->alloc - sh->len; }}return0; } // Set SDSHDR's len static inline void sdssetLen (SDS s, size_t newlen) {unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5:
            {
                unsigned char *fp = ((unsigned char*)s)-1;
                *fp = SDS_TYPE_5 | (newlen << SDS_TYPE_BITS);
            }
            break;
        case SDS_TYPE_8:
            SDS_HDR(8,s)->len = newlen;
            break;
        case SDS_TYPE_16:
            SDS_HDR(16,s)->len = newlen;
            break;
        case SDS_TYPE_32:
            SDS_HDR(32,s)->len = newlen;
            break;
        case SDS_TYPE_64:
            SDS_HDR(64,s)->len = newlen;
            break; Static inline void sdsinclen(SDS s, size_t inc) {unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5:
            {
                unsigned char *fp = ((unsigned char*)s)-1;
                unsigned char newlen = SDS_TYPE_5_LEN(flags)+inc;
                *fp = SDS_TYPE_5 | (newlen << SDS_TYPE_BITS);
            }
            break;
        case SDS_TYPE_8:
            SDS_HDR(8,s)->len += inc;
            break;
        case SDS_TYPE_16:
            SDS_HDR(16,s)->len += inc;
            break;
        case SDS_TYPE_32:
            SDS_HDR(32,s)->len += inc;
            break;
        case SDS_TYPE_64:
            SDS_HDR(64,s)->len += inc;
            break; Static inline size_t sdsalloc(const SDS s) {unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5:
            return SDS_TYPE_5_LEN(flags);
        case SDS_TYPE_8:
            return SDS_HDR(8,s)->alloc;
        case SDS_TYPE_16:
            return SDS_HDR(16,s)->alloc;
        case SDS_TYPE_32:
            return SDS_HDR(32,s)->alloc;
        case SDS_TYPE_64:
            return SDS_HDR(64,s)->alloc;
    }
    return0; } static inline void sdssetalloc(SDS s, size_t newlen) {unsigned char flags = s[-1]; switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5:
            /* Nothing to do, this type has no total allocation info. */
            break;
        case SDS_TYPE_8:
            SDS_HDR(8,s)->alloc = newlen;
            break;
        case SDS_TYPE_16:
            SDS_HDR(16,s)->alloc = newlen;
            break;
        case SDS_TYPE_32:
            SDS_HDR(32,s)->alloc = newlen;
            break;
        case SDS_TYPE_64:
            SDS_HDR(64,s)->alloc = newlen;
            break; }}Copy the code

Create an object

We create the object using the sdsnew method, which determines the initial size by determining whether init is empty, and then call the sdsnew method, which determines the type based on the length. Then allocate the corresponding memory resources according to the type, and finally append the C language terminator ‘\0’.

sds sdsnew(const char *init) {
    size_t initlen = (init == NULL) ? 0 : strlen(init);
    return sdsnewlen(init, initlen);
}


sds sdsnewlen(const void *init, size_t initlen) {
    void *sh;
    sds s;
    char type= sdsReqType(initlen); // Determine the type by the length /* empty string, using sdshdr8, this is the empirical way, when you want to construct an empty string to fit more than 32 strings */if (type == SDS_TYPE_5 && initlen == 0) type = SDS_TYPE_8;
    int hdrlen = sdsHdrSize(type); // To the next method, we have put them together unsigned char *fp; Sh = s_malloc(hdrlen+initlen+1);if(! init) memset(sh, 0, hdrlen+initlen+1);if (sh == NULL) returnNULL; s = (char*)sh+hdrlen; fp = ((unsigned char*)s)-1; Call SDS_HDR_VAR to assign values to different structures, such as len, total length alloc switch(type) {
        case SDS_TYPE_5: {
            *fp = type | (initlen << SDS_TYPE_BITS);
            break;
        }
        case SDS_TYPE_8: {
            SDS_HDR_VAR(8,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_16: {
            SDS_HDR_VAR(16,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_32: {
            SDS_HDR_VAR(32,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break;
        }
        case SDS_TYPE_64: {
            SDS_HDR_VAR(64,s);
            sh->len = initlen;
            sh->alloc = initlen;
            *fp = type;
            break; }}if(initlen && init) memcpy(s, init, initlen); // Finally append'\ 0' 
    s[initlen] = '\ 0';
    returns; } static inline char sdsReqType(size_t string_size) {if (string_size < 1<<5)
        return SDS_TYPE_5;
    if (string_size < 1<<8)
        return SDS_TYPE_8;
    if (string_size < 1<<16)
        return SDS_TYPE_16;
#if (LONG_MAX == LLONG_MAX)
    if (string_size < 1ll<<32)
        return SDS_TYPE_32;
#endif
    return SDS_TYPE_64;
}
Copy the code

delete

String deletions do not reclaim memory directly. Instead, they modify characters to be null, which is lazily freed for future use. The sdsNewlen method above is called again when the sdsempty method is called.

/* Modify the SDS string to make it empty (zero length). * However, all existing buffers are not discarded, but are set to free space * so that the next Append operation will not need to be allocated * When shortening SDS saved strings, the program does not immediately use full memory allocation to reclaim shortened extra bytes and wait for future use. void sdsclear(sds s) { sdssetlen(s, 0); s[0] ='\ 0';
}

sds sdsempty(void) {
    return sdsnewlen("", 0); }Copy the code

Add character (expand) key!!

Add string, sdSCat input parameters are SDS and string t, first call sdsMakeRoomFor expansion method, then append new string, finally add ending ‘\0’. So let’s see how does that work in the expansion method? The first step is to call the sdsavail method of the common method to get how much free space is left. If the free space is greater than the length of the string t to be added, it is returned without expansion. If the free space is insufficient, you want to expand it. If the current string size is smaller than 1 MB, double the size. If the current string size is larger than 1 MB, add 1 MB. The third determines whether the data type after the string is still the same as the original one, and if so, nothing is wrong. If not, you want to create a new SDSHDR and move the existing data over.

Isn’t that a little abstract? For example, STR is now a string of hello, currently sdshdr8, total length 50, used 6, free 44. Now I want to add the character t with length 50. The first step is to see if I want to expand it. 50 is obviously larger than 44, so I need to expand it. In the second step, the length of STR is less than 1M, so the length is doubled. The new length is 50 x 2=100. Is the SDSHDR type of step 3 50+50 still SDSHDR8? Obviously it’s still SDSHDR8, so don’t migrate the data, just add t on the original basis.

sds sdscat(sds s, const char *t) {
    returnsdscatlen(s, t, strlen(t)); } sdsdscatlen (SDS s, const void *t, size_t len) {// call sdslen(sds.h); S = sdsMakeRoomFor(s,len);if (s == NULL) return NULL;
    memcpy(s+curlen, t, len);
    sdssetlen(s, curlen+len);
    s[curlen+len] = '\ 0';
    returns; } sds sdsMakeRoomFor(sds s, size_t addlen) { void *sh, *newsh; // Call sds.h, obtain the idle length alloc size_t Avail = sdsavail(s); size_t len, newlen; chartype, oldtype = s[-1] & SDS_TYPE_MASK; int hdrlen; // If the idle length is larger than that to be added, the value is returned without expansionif (avail >= addlen) returns; Len = sdslen(s); len = sdslen(s); sh = (char*)s-sdsHdrSize(oldtype); Newlen = (len+addlen); //#define SDS_MAX_PREALLOC (1024*1024) // If the new length is less than 1024 x 1024, expand it twiceif (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else// If the new length is greater than 1024*1024, add 2014*1024 newlen += SDS_MAX_PREALLOC; // Compute the new type based on the lengthtype = sdsReqType(newlen);

    /* Don't use type 5: the user is appending to the string and type 5 is * not able to remember empty space, so sdsMakeRoomFor() must be called * at every appending operation. */ if (type == SDS_TYPE_5) type = SDS_TYPE_8; Hdrlen = sdsHdrSize(type); If (oldType ==type) {newsh = s_realloc(sh, hdrlen+newlen+1); if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; Newsh = s_malloc(hdrlen+newlen+1); newsh = s_malloc(hdrlen+ 1); if (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); s_free(sh); s = (char*)newsh+hdrlen; s[-1] = type; sdssetlen(s, len); } // Set the new total length sdssetalloc(s, newlen); return s; } static inline int sdsHdrSize(char type) {switch(type&SDS_TYPE_MASK) {case SDS_TYPE_5: static inline int sdsHdrSize(char type) {switch(type&SDS_TYPE_MASK) {case SDS_TYPE_5: return sizeof(struct sdshdr5); case SDS_TYPE_8: return sizeof(struct sdshdr8); case SDS_TYPE_16: return sizeof(struct sdshdr16); case SDS_TYPE_32: return sizeof(struct sdshdr32); case SDS_TYPE_64: return sizeof(struct sdshdr64); } return 0; }Copy the code

conclusion

This article mainly describes the bottom of Redis SDS, including what IS SDS, compared with the traditional C language advantages, specific logical diagram, common methods (including creation, deletion, expansion, etc.). We also know the difference between Redis embstr and raw. If you feel that writing is also good, trouble to give a praise 👍, your recognition is the power of my writing!

If you think something is wrong, please feel free to comment.

All right, bye.

And a focus on

The resources

[Redis source code analysis] a question about whether SDSHDR5 is used

How to read the Redis source code?

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

String low-level implementation – dynamic String SDS

Redis source code analysis series of articles

preface

Use the API

Differences between EMbstr and RAW

The definition of SDSHdr

SDS logical diagram

The advantage of SDS

Faster to get string length

Data is secure and will not be truncated

SDS critical code analysis

Getting common values (abstracting common methods)

Create an object

delete

Add character (expand) key!!

conclusion

And a focus on

The resources

String low-level implementation – dynamic String SDS

Redis source code analysis series of articles

preface

Use the API

Differences between EMbstr and RAW

The definition of SDSHdr

SDS logical diagram

The advantage of SDS

Faster to get string length

Data is secure and will not be truncated

SDS critical code analysis

Getting common values (abstracting common methods)

Create an object

delete

Add character (expand) key!!

conclusion

And a focus on

The resources

Related Posts

707. Design linked lists

The way PHP code is reused

By the end of this article, you should be able to build a good project structure!