Redis, written in ANSI C, is a high-performance key-value database that can be used in databases, caching, and messaging middleware. Among them, the key in Redis key-value pair is string type, and the value in the key-value pair also has string type. String type is widely used in Redis. This paper mainly introduces the data structure of string — Simple Dynamic String (SDS).

SDS implementation

SDS data structure:

Struct SDSHDR {// struct SDSHDR; // buf remaining available length int free; Char buf[]; }Copy the code

The SDSHDR structure holds the len, free, and buf attributes, which record the used length of the character, the unused length, and the array that actually holds the string. Here is a new SDSHDR structure that holds the Hello World string:

struct sdshdr {
    len = 5;
    free = 0;
    buf = "hello\0"; 
}
Copy the code
  • The free property value of 0 indicates that this SDS has not allocated unused space.
  • The len attribute value of 5 means that the SDS holds a five-byte string.
  • The buf property is an array of type char, with the first five bytes holding ‘h’, ‘e’, ‘L’, ‘L’, ‘O’, and the last byte holding the null character ‘\0’.

SDS follows the convention that C strings end with an empty string, and the saved empty string bytes are not counted in the LEN attribute of SDS. Adding an empty string to the end of a string is done automatically by the SDS function, so the empty character is completely transparent to the user.

Through len attribute, the length of time complexity O(1) can be calculated. In addition, SDSHDR can reduce memory reallocation by allocating some extra space to buF and using free to record the length of unused space. This is an advantage of SDS over C strings.

Why doesn’t Redis use C to represent strings

Redis is developed in C, but for the most used strings, Redis does not use the traditional string representation of C, and uses a simple dynamic string (SDS) built by itself. In C, strings can be represented by an array of char ending at \0. For example, hello world can be expressed as “Hello world\0” in C. The length of an array is fixed after it is initialized. It cannot append a string or calculate the length:

  • I have to iterate through the array every time I compute the length of the string, so the time is order N.
  • Each append to a string requires a memory allocation to the string

SDS optimizes the appending character operation

Redis, as a database, has strict requirements on query speed and frequent data modification. If each string modification requires a memory allocation, it will take a lot of time. So Redis chose SDS over C strings, which can reduce memory allocation for appending characters. To illustrate the changes within SDS when:

redis> set msg "hello world"
OK

redis> append msg " again"
(integer)18

redis> get msg
"hello world again"
Copy the code

First, the set command creates and saves hello World to an SDSHDR with the following values:

struct sdshdr {
     len = 11;
     free = 0;
     buf = "hello world\0";
}
Copy the code

When the append command is executed, the corresponding SDSHDR is updated and the string “again” is appended to the original “hello world” :

struct sdshdr {
     len = 17;
     free = 17;
     buf = "hello world again\0                ";
}
Copy the code

When the set command is called to create the SDSHDR, Redis does not allocate extra space to the SDSHDR and the free attribute is 0. After the append operation, Redis allocates more than twice the required space to the BUF.

After executing the append command, it takes 17 + 1 bytes to save “Hello World again”, but the program allocates 17 + 17 + 1 = 35 bytes to SDSHDR, and if it appends to SDSHDR later, As long as the appending length does not exceed the free attribute value, there is no need to reallocate memory to buF.

For example, executing a later command does not cause buF to reallocate memory because the newly appended string is less than 17:

redis> append msg  " again"
(integer) 23
Copy the code

The corresponding SDSHDR structure is as follows:

struct sdshdr {
     len = 23;
     free = 11;
     buf = "hello world again again\0               ";
}
Copy the code

Redis memory allocation can be viewed source sdS. s/sdsMakeRoomFor, sdsMakeRoomFor function describes the strategy of memory allocation, the following function pseudocode:

// addlen: SDS sdsMakeRoomFor(SDSHDR, addlen) {if (free >= addlen) return s; Newlen = (len+addlen); // If the length of the new character is less than SDS_MAX_PREALLOC, allocate twice the new character space. If (newlen < SDS_MAX_PREALLOC) newlen *= 2; else newlen += SDS_MAX_PREALLOC; Newsh = zrealloc(sh, sizeof(struct SDSHDR)+newlen+1); // Update the free attribute newsh.free = newlen-len; return newsh; }Copy the code

For character shortening, Redis saves the shortened string and does not reallocate the memory. Instead, it uses the free attribute to record the shortened character length.

conclusion

Why does Redis use SDS instead of C strings? SDS has two advantages:

  • Compute character length, C string complexity O (n), SDS complexity O (1)
  • For character appending operations, C strings need to reallocate the memory each time, while SDS performs dynamic expansion each time. When the number of characters added is smaller than the number of free characters, the content is not allocated, reducing the waiting time of the system

reference

Redis design and implementation

If you find this article helpful, please give it a thumbs up!