String

Instead of using C’s traditional string representation directly, Redis built an abstract type called Simple Dynamic String (SDS). I’ll explain why Redis builds SDS instead of C strings. It’s all about improving Redis performance.

The definition of SDS

Here I will first give the definition of SDS, and then I will analyze its unique properties. You can think about the following points when you look at it: why do you need this property? What does this property do? What happens without this property? In the following data structure analysis, I hope you can also think about the design of Redis with similar problems.

struct sdshdr {
    // Records the length of the buF array that has been used
    // The length of the SDS string
    int len;
    
    // Record the unused length of the BUF array
    int free;
    
    // An array of bytes to hold strings
    char buf [] ;
}
Copy the code

A simple SDS string

Advantages of SDS strings over C strings

SDS string has several more properties than C string, so the efficiency of many operations of SDS string is much faster than C string, which is a typical space-time strategy.

Constant complexity gets the length of the string

As you know, to get the length of a C string, you have to iterate over the entire string.

The following figure shows the process of obtaining string length in C

SDS uses the Len attribute to change the time complexity of obtaining the length of an SDS string from O(N) to O(1), ensuring that obtaining the length of a string does not become a performance bottleneck for Redis.

Prevent buffer overflow

In addition to the complexity of getting the length of a string, another problem with C strings not recording their length is the vulnerability to buffer overflows.

Here’s an example of a buffer overflow:

Suppose there are two C strings s1 and s2 that are next to each other in memory. S1 holds the string “Redis” and S2 holds the string “MongoDB”, as shown in Figure 2-7.

If a programmer decides to execute:

strcat(s1, " Cluster");
Copy the code

Change the contents of s1 to “Redis Cluster”, but carelessly he forgot to allocate enough space for S1 before strcat is executed, so that after strcat is executed, s1’s data will overflow into s2’s space. The contents saved by S2 are unexpectedly modified, as shown in Figure 2-8

The above example is a buffer overflow, and SDS does not have a buffer overflow because when the SDS API needs to make changes to SDS, the API passesfreeThis property, judge itAvailable spaceWhether it is enough, and if so, modify it directly; If not, the API willAutomatically expandSDS space to the size required to perform the modification before performing the actual modification operation.

To sum up: C string of space expanding need programmers manual judgment and expand, if there is no expansion is possible buffer overflow, and SDS API will automatically help us to expand the space, do not need to programmers manual to expand the space, thus put an end to the occurrence of the buffer overflow, and SDS based on len and free the expansion of space operation on these two properties, It is also much faster than C.

Reduce the number of memory reallocations caused by changing the string pool

Memory redistribution when refers to modify the string, due to insufficient memory space or beyond, and you need to perform memory redistribution operation, the operation due to involve the memory allocation, execution time cost is extremely high, so we should try to avoid memory redistribution, and SDS is use the len and free the two properties, using two kinds of optimization strategy, This reduces the number of memory reallocation times.

Space pre-allocation (for expanding space)

The space expansion strategy for the C string is, I’ll give you as much space as you need, but when you expand the space for the string it’s still going to have zero free space.

When an SDS API modifies an SDS and needs to expand its space, the program allocates not only the space required for the modification, but also additional unused space to the SDS.

Where, the amount of extra allocated unused space is determined by the following formula:

  • If SDS is modified, the length of SDS (i.elenProperty) will be less than1 MB, then program allocation andlenProperty of the same size of unused spaceWhen the SDSlenThe value of the property andfreeProperty has the same value. For example, if modified after SDSlenWill become13Bytes, then the program will also allocate13Bytes of unused space, SDSbufThe actual length of the array is going to be13 plus 13 plus 1 is 27Byte (an extra byte is used to hold null characters).
  • If you modify SDS, the length of SDS will be greater than or equal to1 MB, then the program willdistribution1 MBUnused space of. For example, if modified after SDSlenWill become30 MB, then the program will allocate1 MBThe unused space of SDSbufThe actual length of the array will be30 MB + 1 MB + 1 byte

With this preallocation strategy, SDS reduces the number of memory reallocations required to grow strings N times in a row from a certain N to a maximum of N.

Lazy space free (for reclaimed space)

The recycling strategy for C strings is to immediately use memory reallocation to reclaim the shortened extra bytes. Each shortening will perform a space reallocation.

SDS does not perform memory reallocation immediately, but uses the free attribute to record the number of bytes and wait for future use.

For example, the sdSTRIm function takes an SDS and a C string as arguments, and removes all characters that appear in the C string from the left and right sides of the SDS, respectively.

For example, for the SDS value s shown in Figure 2-14, execute:

sdstrim(s, "XY"); // Remove all 'X' and 'Y' from SDS stringCopy the code

The SDS will be modified to look like figure 2-15.

Note that the SDS after sdSTRIm does not release the extra 8 bytes of space, but instead reserves the 8 bytes of space in the SDS as unused space, which may be used in future SDS growth operations.

Through a lazy space free strategy, SDS avoids the memory reallocation required to shorten strings and optimizes future growth operations.

SDS also provides apis that allow you to actually free up unused space in SDS when you need it, so you don’t have to worry about wasting memory with lazy space free strategies.

Store binary files in any format

The characters in a C string must conform to some encoding (such as ASCII), and the string must contain no null characters except the end of the string, otherwise the first null character read by the program will be mistaken for the end of the string — these limitations allow C strings to hold only text data. It cannot save binary data such as pictures, audio, video, and compressed files.

And SDS can be used to store these binary data, which is why we call a buff array a byte array, which holds a series of binary data, not characters.

By using binary-safe SDS instead of C strings, Redis can store binary data in arbitrary formats as well as text data.

Compatible with some C string functions

SDS apis set the end of data stored by SDS to null characters, so SDS that store text data can reuse some of the functions defined by the <string.h> library.

conclusion

C string SDS
The complexity of getting string length is O(N). The complexity of getting string length is O(1).
The API is not secure and may cause buffer overflows. The API is secure and does not cause buffer overflows.
Changing the length of a stringNSecondary necessity requires executionNSecondary memory reallocation. Changing the length of a stringNThis command can be executed at most timesNSecondary memory reallocation.
Only text data can be saved. Can save text or binary data.
You can use all<string.h>Functions in the library. You can use some of it<string.h>Functions in the library.

Reference book Redis Design and Implementation Huang Jianhong