Redis Basic Data Structure simple string SDS (2)

This is the second day of my participation in the November Gwen Challenge. Check out the details: the last Gwen Challenge 2021

Redis version: 3.0.4

Structure definition of SDS

Redis does not directly use the STRING of C language to represent the STRING type of Redis. The bottom of the STRING type of Redis is a type of SDS structure. This is the structure commonly used in C.

The structure of SDS is defined in sdS.h/SDSHDR structure

struct sdshdr {
	
	// Records the number of unused bytes in the BUF array
	int free;
	
	// Records the number of bytes used in the BUF array
	// is equivalent to the length of the string saved by SDS
	int len;
	
	// An array of bytes to hold strings
	char[] buf;
}
Copy the code

Note: after Redis3.2.0, there is no free property. Instead, the alloc property is used to represent the actual allocated length of the BUF array, namely the value of free+len. = =

Suppose that a string with the value ‘Redis’ is stored in Redis, then its structure in SDS would look something like this.

C string and SDS

Why doesn’t Redis just use C strings, but encapsulates SDS to store string data? This is because the C string itself has several problems, which is not suitable for Redis based on speed, so it needs to encapsulate SDS structure to improve the reading speed. Here are a few of their differences:

1. Constant complexity Gets the length of the string

In C, to get the length of a string, the program iterates through the string until it encounters the null character ‘/0’, which represents the end of the string, before calculating the length of the string, which is O(N) time.

Unlike C, SDS defines the len attribute to represent the length of the saved string, using only O(1) time complexity. Setting and updating the SDS length is done automatically by the SDS API at execution time without additional manual changes to len values with SDS.

This ensures that fetching string lengths does not become a performance bottleneck for Redis.

2. Prevent buffer overflow

Another problem with C strings that do not record string lengths is that they are prone to buffer overflows. When modifying a string in C, you need to allocate space for the string manually. If you forget to allocate space in advance and the space is insufficient, the string buffer overflows, causing other string values to be changed.

Unlike C, SDS has an automatic space allocation policy. When THE SDS API needs to modify the SDS, THE API will check whether the SDS space is enough. If not, the API will automatically expand the space to the required size, and then modify the string.

Therefore, using SDS does not require manual modification of SDS space size, and avoids the possibility of buffer overflow.

3. Reduce the number of memory reallocation times caused by string modification

Memory reallocation is usually a time-consuming operation, and for Redis, data is often used in scenarios where changes are frequent and speed is required. Therefore, each memory reallocation may have an impact on performance.

By using unused space, SDS realizes two optimization strategies: ** space pre-allocation and lazy space release.

3.1 Space preallocation

Space preallocation is used to optimize string growth operations. When the SDS API modifies the SDS and runs out of space, it automatically allocates the required space to the SDS, as well as additional unused space to the SDS.

The amount of additional unused space allocated is determined by the following formula:

If the length of SDS is less than 1MB after modification, the program will allocate the same amount of unused space to SDS as len, that is, the additional space length of free=len.
If the length is greater than or equal to 1MB, the program allocates an additional 1MB of unused space.

That is, the unused space allocated is less than or equal to 1MB.

With a space preallocation strategy, Redis can reduce the number of memory reallocations required to perform consecutive string growth operations.

3.2 Lazy space release

Lazy space free is used to optimize SDS string shortening operations: When the SDS API needs to shorten the string saved by SDS, the program does not immediately use memory space reallocation to reclaim the shortened extra bytes, but instead uses the free property to record the number of bytes and wait for future use.

By using a lazy space free strategy, SDS avoids the space reallocation operations required to shorten strings and provides optimization for possible future growth operations.

SDS has apis to actually free up SDS unused space.

4. Binary security

C string of characters must comply with some coding (such as ASCII), and in addition to the null character at the end of the string, the string cannot contain null character, or by the program first read in null characters will be mistaken for the end of the string, this limits the C string can only save the text data, can’t save pictures, audio, video and other binary data.

SDS avoids this problem. Instead of reading a null character and assuming it is the end of the string, the program decides on len length.

5. Functions that are compatible with part of C strings

In the STRUCTURE of SDS, the character data stored in the char array ends with ‘/0’. The advantage is that programs can use C string functions directly without having to wrap them.

conclusion

C strings differ from SDS as follows:

C string	SDS
The time to get the string length is O(N)	O(1)
The API is not secure and may cause buffer overflows	The API is secure and does not cause buffer overflows
Changing the length of a string requires N memory reallocations	The number of memory reallocations is less than or equal to N
Only text data can be saved	Can store a range of binary data, such as text, images, audio/video
You can use all the functions of <string.h>	Only partial functions of <string.h> can be used