A list,

Instead of using the traditional string representation of C directly (null-terminated character arrays), Redis built itself an abstract type called Simple Dynamic String (SDS) and used SDS as the default string representation of Redis.

  • C strings are used in Redis only where you don’t need to modify strings, such as printing logs.
  • Other things like key-value pairs, keys and values are SDS.

Eg:

redis> SET msg "hello world"
OK
Copy the code

The underlying implementation of the key-value pair’s key “MSG” and value “hello world” is SDS.

Second, the definition of

1) Before version 3.2

SDS prior to REDIS3.2 was defined as follows:

struct sdshdr { unsigned int len; // SDS current length unsigned int free; // SDS not currently used length char buf[]; // where the SDS is stored};Copy the code

Note: SDSHDR and SDS are one – to – one correspondence.

As len is an unsigned integer, the maximum length of an SDS string is 2^(8*sizeof(unsigned int)), which is 2^ 16 or 2^ 32.The figure above is an example of an SDS data structure.

2) After version 3.2

Version 3.2 introduced five SDSHDR types, which are selected based on length when creating.

struct __attribute__ ((__packed__)) sdshdr5 { unsigned char flags; // Total 8 bits, lower 3 bits for real flags(type) and higher 5 bits for len(length). char buf[]; // where the SDS is stored}; struct __attribute__ ((__packed__)) sdshdr8 { uint8_t len; Uint8_t alloc; // Indicates the size of memory allocated for SDS unsigned char flags; Char buf[]; // where the SDS is stored}; struct __attribute__ ((__packed__)) sdshdr16 { uint16_t len; /* used */ uint16_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr32 { uint32_t len; /* used */ uint32_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; }; struct __attribute__ ((__packed__)) sdshdr64 { uint64_t len; /* used */ uint64_t alloc; /* excluding the header and null terminator */ unsigned char flags; /* 3 lsb of type, 5 unused bits */ char buf[]; };Copy the code
  • sdshdr5
    • Flag: unsigned char. The lower 3 bits are used to hold the flag type, because char is 1 byte (8 bits) and flag is a maximum of 4 bits (up to 7 bits).
    • At least 3 bits are needed to indicate:

000: sdSHDR5 001: sdSHDR8 010: sdSHdr16 011: sdSHdr32 100: sdSHdr64 – The high 5 bits are used to indicate the length, meaning that the maximum length of the string represented by this type of SDSHDR is 2^5. -buf [] : stores the actual string

  • Other SDSHDR
    • Len: Current length of SDS, excluding terminator ‘\0’
    • Alloc: Size of allocated memory, should be greater than or equal to len, excluding terminator ‘\0’
    • Flags: indicates the current SDSHDR type. The lower three bits are sufficient and the higher five bits are free.
    • Buf [] : stores the actual string
  • __attribute__ ((packed))

This keyword unaligns bytes. Let our structure, in a compact arrangement of memory, to save memory overhead.

Three, the source part

1. Find SDS length (SDSLen)

#define SDS_HDR(T,s) ((struct sdshdr##T *)((s)-(sizeof(struct sdshdr##T)))) // Return the length of SDSHDR with the flags member of sdshdr5. #define SDS_TYPE_5_LEN(f) ((f)>>SDS_TYPE_BITS) #define SDS_TYPE_MASK 7 #define SDS_TYPE_BITS 3 static inline size_t sdslen(const sds s) { unsigned char flags = s[-1]; // SDSHDR flags member variable switch(flags&SDS_TYPE_MASK) {case SDS_TYPE_5: return SDS_TYPE_5_LEN(flags); case SDS_TYPE_8: return SDS_HDR(8,s)->len; Case SDS_TYPE_16: return SDS_HDR(16,s)->len; case SDS_TYPE_32: return SDS_HDR(32,s)->len; case SDS_TYPE_64: return SDS_HDR(64,s)->len; } return 0; }Copy the code
  • The double hash ## in the first line means to concatenate two tokens in a macro definition. After concatenating the tokens, the compiler recognizes the tokens as one.
  • To determine what type of SDSHDR SDS is, flags&SDS_TYPE_MASK results are compared to SDS_TYPE_n.

We need flags and masks to do bitwise and bitwise operations here because sdSHdr5 is a special case where the top 5 bits are used to indicate length, so we need to use masks to clear them.

  • S [-1] : the pointer to s is moved one bit left. Since memory alignment is disabled, s points to the array buf[], which happens to be the FLAGS member of SDSHDR.

2. Create an SDS

There are four ways to create an SDS, and the first three are listed below:

sds sdsempty(void) {
    return sdsnewlen("",0);
}

/* Create a new sds string starting from a null terminated C string. */
sds sdsnew(const char *init) {
    size_t initlen = (init == NULL) ? 0 : strlen(init);
    return sdsnewlen(init, initlen);
}

/* Duplicate an sds string. */
sds sdsdup(const sds s) {
    return sdsnewlen(s, sdslen(s));
}
Copy the code

You can see that they both call the method sdsNewlen. Let’s see how this is implemented.

// The contents of the memory that the init pointer points to are truncated to create a new SDS. This function is binary safe SDS sdsnewlen(const void *init, size_t initlen) {void *sh; // SDSHDR pointer SDS s; // pointer to buf[] char type = sdsReqType(initlen); If (type == sds_type_5&& initlen == 0) type = SDS_TYPE_8; // If initlen is empty and sdshdr5 is of type, set the type to sdshdr8 int hdrlen = sdsHdrSize(type); // The size of each SDSHDR type is different, and the size of the SDSHDR type is returned to calculate the space allocated (this space does not include the character array buf[]) unsigned char *fp; Sh = s_malloc(hdrlen+initlen+1); // Apply a contiguous space in the heap for SDSHDR and its SDS, +1 because '\0' if (! init) memset(sh, 0, hdrlen+initlen+1); If (sh == NULL) return NULL; // If init is NULL, the entire SDSHDR is initialized with 0. s = (char*)sh+hdrlen; Buf [] fp = ((unsigned char*)s)-1; / / find the location of the flags, equivalent to & s [1] the switch (type) {case SDS_TYPE_5: {* fp = type | (initlen < < SDS_TYPE_BITS); //initlen moves 3 bits left to 5 bits high to make room for type, and breaks with type; } case SDS_TYPE_8: { SDS_HDR_VAR(8,s); Struct sdshdr##T; // declare a new local variable sh (struct sdshdr##T) in switch scope. sh->len = initlen; sh->alloc = initlen; *fp = type; // Set flags break; } case SDS_TYPE_16: { SDS_HDR_VAR(16,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case SDS_TYPE_32: { SDS_HDR_VAR(32,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } case SDS_TYPE_64: { SDS_HDR_VAR(64,s); sh->len = initlen; sh->alloc = initlen; *fp = type; break; } } if (initlen && init) memcpy(s, init, initlen); //memcpy will not stop because of '\0', support binary data copy s[initlen] = '\0'; // Whether binary data or not, the end will be '\0' return s; } static inline int sdsHdrSize(char type) { switch(type&SDS_TYPE_MASK) { case SDS_TYPE_5: return sizeof(struct sdshdr5); Case SDS_TYPE_8: return sizeof(struct sdshdr8); case SDS_TYPE_8: return sizeof(struct sdshdr8) case SDS_TYPE_16: return sizeof(struct sdshdr16); case SDS_TYPE_32: return sizeof(struct sdshdr32); case SDS_TYPE_64: return sizeof(struct sdshdr64); } return 0; }Copy the code

The pointer points as shown below:

3. Space pre-allocation (sdsMakeRoomFor)

Increments a string

#define SDS_MAX_PREALLOC (1024*1024) sds sdsMakeRoomFor(sds s, size_t addlen) { void *sh, *newsh; size_t avail = sdsavail(s); Size_t len, newlen; char type, oldtype = s[-1] & SDS_TYPE_MASK; SDSHDR int hdrlen; size_t usable; if (avail >= addlen) return s; // Return len = sdslen(s) if there is enough space; Sh = (char*) s-sdshdrSize (oldType); Newlen = (len+addlen); // New SDS length if (newlen < SDS_MAX_PREALLOC)// If the new length is less than 1MB, the new space is doubled, otherwise the new length is increased by 1MB newlen *= 2; else newlen += SDS_MAX_PREALLOC; type = sdsReqType(newlen); If (type == SDS_TYPE_5) type = SDS_TYPE_8; // If the SDS type is SDS_TYPE_5, it will be transformed because this type does not record empty space, i.e. there is no corresponding member variable in SDSHDR hdrlen = sdsHdrSize(type); If (oldType ==type) {newsh = s_realloc_usable(sh, hdrlen+newlen+1, &usable); // If there is enough space in the original address to complete the reallocation, then it returns the same new address as the old address passed in; Otherwise, it allocates a new address block and moves the data. if (newsh == NULL) return NULL; s = (char*)newsh+hdrlen; Else {// Type change newsh = s_malloc_usable(hdrlen+newlen+1, &usable); If (newsh == NULL) return NULL; memcpy((char*)newsh+hdrlen, s, len+1); // copy data to new memory address s_free(sh); S = (char*)newsh+hdrlen; // Move the pointer to the hdrlen bit, pointing to the buf[] data s[-1] = type; sdssetlen(s, len); // Set len} usable = usable-hdrlen-1; if (usable > sdsTypeMaxSize(type)) usable = sdsTypeMaxSize(type); sdssetalloc(s, usable); // set SDSHDR to alloc return s; }Copy the code

4. Extended String (sdSCATlen)

Performs string concatenation

sds sdscatlen(sds s, const void *t, size_t len) { size_t curlen = sdslen(s); // Current string length s = sdsMakeRoomFor(s,len); If (s == NULL) return NULL; memcpy(s+curlen, t, len); Sdssetlen (s, curlen+len); curlen (s, curlen+len); // Set len s[curlen+len] = '\0'; // String ends with '\0' return s; }Copy the code

5. Lazy Space Release (SDSTRIM)

As you can see from the code below, this function just removes the left and right whitespace characters and changes the len attribute in SDSHDR. It does not really change the size of the memory allocated by SDS. The remaining space is used for future string extensions.

sds sdstrim(sds s, const char *cset) { char *start, *end, *sp, *ep; size_t len; sp = start = s; ep = end = s+sdslen(s)-1; while(sp <= end && strchr(cset, *sp)) sp++; while(ep > sp && strchr(cset, *ep)) ep--; len = (sp > ep) ? 0 : ((ep-sp)+1); if (s ! = sp) memmove(s, sp, len); s[len] = '\0'; sdssetlen(s,len); return s; }Copy the code

6. Free the memory (sdsRemoveFreeSpace)

This function compresses memory so that alloc=len. If the type is smaller, open another memory copy, if the type is unchanged, realloc. SdsMakeRoomFor is basically the reverse operation, no further details.

sds sdsRemoveFreeSpace(sds s) {
    void *sh, *newsh;
    char type, oldtype = s[-1] & SDS_TYPE_MASK;
    int hdrlen, oldhdrlen = sdsHdrSize(oldtype);
    size_t len = sdslen(s);
    size_t avail = sdsavail(s);
    sh = (char*)s-oldhdrlen;

    if (avail == 0) return s;

    type = sdsReqType(len);
    hdrlen = sdsHdrSize(type);

    if (oldtype==type || type > SDS_TYPE_8) {
        newsh = s_realloc(sh, oldhdrlen+len+1);
        if (newsh == NULL) return NULL;
        s = (char*)newsh+oldhdrlen;
    } else {
        newsh = s_malloc(hdrlen+len+1);
        if (newsh == NULL) return NULL;
        memcpy((char*)newsh+hdrlen, s, len+1);
        s_free(sh);
        s = (char*)newsh+hdrlen;
        s[-1] = type;
        sdssetlen(s, len);
    }
    sdssetalloc(s, len);
    return s;
}

Copy the code

4, SDS and C string difference

1. Constant complexity Gets the length of the string

  • Because a C string does not record its own length, to obtain the length of a C string, the program must traverse the entire string, counting every character it encounters until it encounters a null character representing the end of the string, an operation of O(N).

As shown below:

  • Unlike C strings, obtaining an SDS length is O(1) because SDS records the length of SDS itself in the LEN attribute.

2. Prevent buffer overflow

  • Since C strings do not record their length, strcat assumes that the user has allocated enough memory to hold all the contents of the SRC string when the function is executed, and that a buffer overflow will occur if this assumption is not made.

As shown in the figure below, s1 and S2 are adjacent, and if S1 were to expand, it would affect the S2 string, which is not allowed to happen.

  • Unlike C strings, SDS’s space allocation strategy completely eliminates the possibility of buffer overflows: When THE SDS API needs to modify the SDS, the API will check whether the SDS space meets the requirements for modification. If not, the API will automatically expand the SDS space to the required size, and then perform the actual modification operation. Therefore, using SDS does not need to manually modify the SDS space. There will also be no buffer overflow problems described earlier.

3. Reduce the number of memory reallocation times caused by string modification

  • Because a C string does not record its own length, the underlying implementation of a C string containing N characters is always an array of N+1 characters long (an extra character space is used to hold null characters). Because of this correlation between the length of the C string and the length of the underlying array, every time a C string is increased or shortened, the program always reallocates the array that holds the C string:
    • If a program performs an operation that grows a string, such as append, it needs to increase the size of the underlying array by reallocating memory before performing this operation — forgetting this step can result in a buffer overflow.
    • If the program is performing operations that shorten the string, such as truncation (trim), then after this operation, the program needs to reallocate memory to free up space that is no longer used by the string — forgetting this step can cause a memory leak.
  • To avoid this defect of C strings, SDS disassociates the string length from the underlying array length by using no space: In SDS, the length of the BUF array is not necessarily the number of characters plus one. The array can contain unused bytes, and the total number of bytes is recorded by the SDS alloc property (len currently used, and unused free space). SDS realizes two optimization strategies of space pre-allocation and inert space release through unused space.

3.1 Space Preallocation Space preallocation is used to optimize the string growth operation of SDS: When the SDS API modifies an SDS and requires space expansion, the program not only allocates the space required for the modification to the SDS, but also allocates additional unused space to the SDS.

  • If the length of SDS (the value of len) is less than 1MB after modification of SDS, the program allocates the same amount of unused space as Len, and the value of SDS LEN is half that of alloc.
  • If the SDS is greater than or equal to 1MB in length after modification, the program allocates 1MB of unused space.
  • Before expanding the SDS space, the SDS API checks to see if the unused space is sufficient, and if so, the API uses the unused space directly without performing memory reallocation.
  • With this preallocation strategy, SDS reduces the number of memory reallocations required to grow strings N times in a row from a certain N to a maximum of N.

3.2 Lazy space Free Lazy space free is used to optimize the string shortening operation of SDS: When the SDS API needs to shorten the string saved by SDS, the program does not immediately use memory reallocation to reclaim the shortened extra bytes, but instead uses the free property to record the number of bytes and wait for future use.

  • SDS also provides apis that allow us to actually free up SDS unused space when we need it, so we don’t have to worry about wasted memory due to lazy space free strategies.

4. Binary security

  • The characters in a C string must conform to some encoding (such as ASCII), and the string must contain no null characters except the end of the string. Otherwise, the first null character read by the program will be mistaken for the end of the string. These limitations allow C strings to hold only text data. It cannot save binary data such as pictures, audio, video, and compressed files.
  • All SDS apis treat the data stored in the BUF array by SDS as binary, and the program does not restrict, filter, or make assumptions about the data as it was written and as it was read.

That’s why we call the BUF property of SDS a byte array — Redis doesn’t use this array to hold characters, it uses it to hold a series of binary data.

5. Compatible with some C string functions

Although SDS apis are binary safe, they all follow the null-terminating convention of C strings: These apis always set the end of the data stored by SDS to a null character, and always allocate an extra byte to hold the null character when assigning space to the BUF array, so that SDS that hold text data can reuse some of the functions defined by the <string.h> library.

5. Reference materials

  • Redis Design and Implementation
  • Lynnapan. Making. IO / 2017/07/14 /…
  • Blog.csdn.net/zgaoq/artic…