Recently, I have been reading the Redis design and implementation of Huangz Daishen.

I am ashamed to say that I have been using Redis in my work and purchased this book for a long time. I have read several books on MySQL, but this is the first one on Redis.


The first part of the data structure and object, divided into seven chapters on the simple dynamic string, linked list, dictionary, skip list, integer set, compressed list, object in Redis implementation and application.


This article only briefly describes SDS and its differences from strings in C.



Instead of using C’s traditional string representation directly (null-terminated character arrays, hereinafter referred to as C strings), Redis built an abstract type called Simple Dynamic String (SDS), SDS is used as the default string representation of Redis.


SDS structure is as follows:

struct sdshdr {

    // Records the number of bytes used in the BUF array
    // is equal to the length of the string saved by SDS
    int len;

    // Records the number of unused bytes in the BUF array
    int free;

    // An array of bytes to hold strings
    char buf[];

};
Copy the code


SDS has the following advantages over C strings:

<1>. O(1) Time complexity Obtains the length of the character string

Because a C string does not record its own length, to obtain a C string length, the program must traverse the entire string, counting every character it encounters until it encounters a null character representing the end of the string. This operation is O(N).

Unlike C strings, obtaining an SDS length is O(1) because SDS records the length of SDS itself in the LEN attribute.


<2>. Prevent buffer overflows

Another problem with C strings that do not record their length is that they tend to cause buffer overflow.

Unlike C strings, SDS’s space allocation strategy completely eliminates the possibility of buffer overflows: When THE SDS API needs to modify the SDS, the API will check whether the SDS space meets the requirements for modification. If not, the API will automatically expand the SDS space to the required size before the actual modification operation. So using SDS does not require manual modification of SDS space size, nor does it cause buffer overflow problems.


<3>. Reduces the number of memory reallocations when modifying strings

For a C string containing N characters, the underlying implementation of the C string is always an array of N+1 characters long (an extra character space is used to hold null characters).

Because of this correlation between the length of the C string and the length of the underlying array, every time a C string is modified (increased or shortened), the program always performs a memory reallocation of the array that holds the C string

Memory reallocation involves complex algorithms and may require system calls, so it is usually a time-consuming operation:

  • In general programs, if changing the length of a string is infrequent, it is acceptable to perform a memory reallocation for each change.

  • But Redis as a database, is often used to speed demanding, data are frequently modified occasions, if every time to modify the length of the string need to perform a memory redistribution, the execution time for the redistribution of memory alone would take up the greater part of the modified string used in time, if this change happen so frequently, There may also be an impact on performance.

To avoid this defect of C strings, SDS disassociates the string length from the underlying array length by using no space: In SDS, the length of the BUF array is not necessarily the number of characters plus one. The array can contain unused bytes, and the number of bytes is recorded by the FREE property of SDS.


<4>. Binary security

Characters in a C string must conform to some encoding (such as ASCII), and the string must not contain empty characters except for the end of the string, otherwise the first empty character read by the program will be mistaken for the end of the string — these limitations make C strings hold only text data. It cannot save binary data such as pictures, audio, video, and compressed files.

Although databases are commonly used to store textual data, binary data is also commonly used to store binary data. Therefore, to ensure that Redis can be used ina variety of scenarios, SDS apis are binary-safe: All SDS apis treat SDS data stored in buF arrays as binary, and the program does not restrict, filter, or make assumptions about the data as it was written and as it was read.

That’s why SDS’s BUF property is called a byte array — Redis doesn’t use this array to hold characters, it uses it to hold a series of binary data.


<5>. Compatible with some C string functions

The SDS API also follows the null-terminating convention of C strings: These apis always set the end of the data stored by SDS to a null character, and always allocate an extra byte to hold the null character when allocating space for the BUF array. This allows SDS that hold text data to reuse some of the functions defined by the <string.h> library. This eliminates the need to rewrite and avoid unnecessary code duplication.



Reference:

SDS and C strings

Redis SDS and C string difference