Author: Pushy

Pushy. site/2019/12/21/…

Copyright Notice: All articles on this blog are licensed under a CC BY-NC-SA 3.0 license unless otherwise stated. Reprint please indicate the source!

1. What is SDS

As we all know, of Redis’s five data deconstructions, the simplest are strings:

redis> set msg "Hello World"
Copy the code

Instead of using C’s traditional string representation, Redis built an abstract data structure called Simple Dynamic String (SDS).

Execute the Redis command above to create a key-value pair in the Server database, that is:

  • SDS with key “MSG”;
  • SDS with the value “Hello World”.

The sdS. h header of Redis defines the structure of SDS:

struct sdshdr {
    // Records the number of bytes currently used in the BUF array
    unsigned int len;
    // Record the free space in the buF array
    unsigned int free;
    // Array of bytes
    char buf[];

};
Copy the code

As can be seen, SDS describes the current storage status of byte array BUF through len and free attribute values, which plays a great role in later expansion and other operations, and can also obtain the length of string with the complexity of O(1). (As we know, the string with C itself does not record the length information. Can only iterate over the entire string statistics).

So why did Redis implement its own string data deconstruction? Here slowly to study!

2. Advantages of SDS

Prevent buffer overflow

In addition to the complexity of retrieving the length of a string, another problem with C strings not recording their length information is that they tend to run out of memory. For example, append the motto string to the s1 string using C’s built-in strcat method:

void wrong_strcat(a) {
    char *s1, *s2;

    s1 = malloc(5 * sizeof(char));
    strcpy(s1, "Hello");
    s2 = malloc(5 * sizeof(char));
    strcpy(s2, "World");

    char *motto = " To be or not to be, this is a question.";
    s1 = strcat(s1, motto);

    printf("s1 = %s \n", s1);
    printf("s2 = %s \n", s2);
}

// s1 = Hello To be or not to be, this is a question. 
// s2 = s a question. 
Copy the code

But the output is unexpected. We only want to change the s1 string, and s2 string is also changed. This is because the STRCAT method assumes that the user has allocated enough memory for S1 to hold the content of the Motto string before execution. If this assumption is not true, buffer overflows will occur.

Through Debug, we can see that the initial position of s1 variable memory is 94458843619936 (base 10), and the initial position of S2 is 94458843619968, which are adjacent memory blocks:

Therefore, once the duration of motto appended to S1 by strcat is greater than the memory address interval between S1 and S2, it will change to the value of s2 variable. The correct thing to do is to resize s1 before strcat so that s2 is not changed:

void correct_strcat(a) {
    char *s1, *s2;

    s1 = malloc(5 * sizeof(char));
    strcpy(s1, "Hello");
    s2 = malloc(5 * sizeof(char));
    strcpy(s2, "World");

    char *motto = " To be or not to be, this is a question.";
    // Create a motto for a variable (1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000).
    s1 = realloc(s1, (strlen(motto) * sizeof(char)) + 1);
    s1 = strcat(s1, motto);

    printf("s1 = %s \n", s1);
    printf("s2 = %s \n", s2);
}

// s1 = Hello To be or not to be, this is a question. 
// s2 = World 
Copy the code

As can be seen, the starting location of s1 variable memory address is 94806242149024 (decimal), and s2 variable memory address is 94806242148992. The space between the memory addresses of S1 and S2 is large enough to hold a motto string:

Different from C string, the space allocation strategy of SDS completely eliminates the possibility of buffer overflow, which is implemented in SDS.C. By reading the source code, we can understand that SDS can prevent buffer overflow because when sdsMakeRoomFor is called again, it will check whether SDS space meets the required requirements (i.e., free >= addlen condition). If it does, Redis will expand the SDS space to the size required to perform the actual concat operation, thus avoiding overflow:

// Similar to the C string.h/strcat function, it appends a C string to SDS
sds sdscat(sds s, const char *t) {
    return sdscatlen(s, t, strlen(t));
}

sds sdscatlen(sds s, const char *t, size_t len) {
    struct sdshdr *sh;
    size_t curlen = sdslen(s);  // Get the len attribute value of SDS

    s = sdsMakeRoomFor(s, len);
    if (s == NULL) return NULL;
    // Convert SDS to SDSHDR, as described below
    sh = (void *) (s - sizeof(struct sdshdr));
    // Copy the string t into the memory address space starting with s+curlen
    memcpy(s + curlen, t, len);
    sh->len = curlen + len;     // Concat length = original length + len
    sh->free = sh->free - len;  // free after concat = original free space size -len
    s[curlen + len] = '\ 0';     // As with the C string, it ends with a null character \0
    return s;
}

// Make sure there is enough space for the added C string, and additional unused space is allocated
// This eliminates the possibility of buffer overflows
sds sdsMakeRoomFor(sds s, size_t addlen) {
    struct sdshdr *sh, *newsh;
    size_t free = sdsavail(s);  // The current free space size
    size_t len, newlen;

    if (free >= addlen) {
        /* If there is enough free space to accommodate the size of the added C string, return it, otherwise the following code is executed to extend the buf byte array */
        return s;
    }
    len = sdslen(s);  // The number of bytes currently in use
    sh = (void *) (s - (sizeof(struct sdshdr)));
    newlen = (len + addlen);  // The new length of bytes after concatenation

    if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;
    newsh = realloc(sh, sizeof(struct sdshdr) + newlen + 1);
    if (newsh == NULL) return NULL; // Failed to allocate memory

    /* Free space for new SDS = new size - size of concatenated C string */
    newsh->free = newlen - len;
    return newsh->buf;
}
Copy the code

Void *) (s-sizeof (struct SDSHDR)); Struct SDSHDR sh = void (s-(sizeof(struct SDSHDR))

Reduce the number of memory reallocations caused by modifying characters

For a C string containing N characters, the bottom layer is always implemented by N+1 contiguous memory arrays. Because of this relationship, each change requires a memory reallocation of the C string array:

  • In the case of concatenation: expand the size of the underlying array to prevent buffer overflows (mentioned earlier);
  • In the case of truncation, unused memory space needs to be freed to prevent memory leaks.

Redis is a database that is frequently accessed and modified. SDS is also needed to reduce the performance impact of memory reallocation caused by modified characters. Because in SDS, the length of the BUF array is not necessarily the number of strings + 1, it can contain unused characters, recorded by the free attribute value. With unused space, SDS implements the following two optimization strategies:

ⅰ. Space pre-allocation

Space preallocation is used to optimize operations for SDS growth: When an SDS is modified and needs to be spatially expanded, Redis not only allocates the space necessary for the modification to the SDS, but also allocates additional unused space to the SDS.

As you can see from the previous sdsMakeRoomFor method, there are two strategies for the amount of extra unused space allocated:

  • SDS is less thanSDS_MAX_PREALLOCLen will be equal to free;
  • SDS greater than or equal toSDS_MAX_PREALLOC: Direct distributionSDS_MAX_PREALLOCSize.
sds sdsMakeRoomFor(sds s, const char *t, size_t len) {...if (newlen < SDS_MAX_PREALLOC)
        newlen *= 2;
    else
        newlen += SDS_MAX_PREALLOC;
    newsh = realloc(sh, sizeof(struct sdshdr) + newlen + 1);
    if (newsh == NULL) return NULL;
    newsh->free = newlen - len;
    return newsh->buf;
}
Copy the code

With a space preallocation strategy, Redis can reduce the number of memory reallocations required to perform consecutive string growth operations.

ⅱ. Release of inert space

Lazy space free is used to optimize SDS string shortening. When it is necessary to shorten the string saved by SDS, Redis does not immediately use memory reallocation to reclaim the extra bytes of shortening, but instead uses the free property to record these bytes and wait for them to be used.

For example, you can see that the extra 22 bytes of space are not immediately reclaimed after sdSTRIm is executed, but instead are stored as the value of the free variable. When sdSCat is executed, the 22 bytes of space previously freed is enough to accommodate the 11 bytes of the appended C string, and there is no need for extended reallocation of memory space.

#include "src/sds.h"

int main(a) {
    // sds{len = 32, free = 0, buf = "AA... AA.a.aa.aHelloWorld :::"}
    s = sdsnew("AA... AA.a.aa.aHelloWorld :::");  
    // sds{len = 10, free = 22, buf = "HelloWorld"}
    s = sdstrim(s, "Aa. :");  
    // sds{len = 21, free = 11, buf = "HelloWorld! I'm Redis"}
    s = sdscat(s, ! "" I'm Redis");   
    return 0;
}
Copy the code

By using a lazy space free strategy, SDS avoids the memory reallocation operations required to shorten strings and provides optimization for future growth operations. At the same time, SDS also has corresponding API to actually free up SDS unused space.

Binary security

The C string must conform to some encoding, and the string cannot contain a null character (\0) except for the end of the string, otherwise it will be mistaken for the end of the string. These limitations make it impossible to save binary data such as images, audio, and so on.

But Redis can store binary data because SDS uses the len attribute value instead of the null character to determine whether a string is terminated.

Compatible with some C string functions

We found that the byte array of SDS has some similarities to the C string, such as ending with \0 (but not with this flag as the end of the string). This allows SDS to reuse functions defined by the

library:

#include <stdio.h>
#include <strings.h>
#include "src/sds.h"

int main(a) {
    s = sdsnew("Cat");
    // Compare size according to character set
    int ret = strcasecmp(s, "Dog");
    printf("%d", ret);
    return 0;
}
Copy the code

3. Summary

After seeing the realization of Redis SDS, I finally know that Redis is only so fast, certainly inseparable from the NETWORK I/O model of Epoll, but also inseparable from the simple data structure of the underlying optimization.

The beauty of SDS is that it coordinates the expansion and scaling of byte arrays through the len and free attribute values, providing better performance than too many C strings. What is awesome? This is awesome!