This is the 24th day of my participation in the August More Text Challenge

Instead of using the traditional string representation of C directly (an array of characters ending with a null character \0), Redis builds an abstract type called simple dynamic string SDS and uses SDS as the default string representation of Redis.

SDS data structure


struct sdshdr{
	// Records the number of bytes already used in the BUF array
	// is equal to the length of the string saved by SDS
	int len;
	// Records the number of unused bytes in the BUF array
	int free;
	// Byte array the user saves the string
	char buff[];
}
Copy the code

In the figure above is an SDS object with the string value Redis; Length 5, remaining free space 3; ‘\0’ is that SDS follows the convention of C strings ending in an empty string (it follows because SDS can reuse some C library functions) and saves one byte of the empty string without counting in len;

SDS and C strings


C uses an array of characters of length N+1 to represent strings of length N, and the number always ends with the empty string ‘\0’.Comparing the two structures, let’s analyze why Redis defines SDS by itself

1. Constant complexity Gets the length of the string

Because the C string does not record its own length information, in order to obtain the length, the whole string must be traversed each time, and the time complexity is O(N).

SDS itself has an attribute len that stores its own length, so it only needs to obtain this attribute, and the time complexity is O(1).

And setting and updating SDS lengths is done automatically at execution time using the SDS API. So we made sure that the STRLEN command didn’t become a bottleneck for us.

2. Prevent buffer overflow

In addition to the complexity of retrieving length, C strings that do not record len are prone to buffer overflows. Like C string concatenationchar *strcat(char *dest,const char *src)Operation; You need to allocate enough memory for DEST beforehand, and if you forget to allocate memory for DEST beforehand, a buffer overflow will occur. For exampleDifferent from C character,SDS space allocation strategy eliminates the possibility of buffer overflow. When the SDS API needs to modify the SDS, the API checks whether the SDS space meets the requirements for modification. If not, the API automatically expands the SDS space to the size required for modification. Because capacity expansion is automatically done by the API, no buffer overflows occur!

3. Reduce the number of memory reallocations caused by string changes

Because each increment or shortening of the C string requires a memory reallocation:

- need to growth, the program through the memory redistribution to expand the size of the underlying array - if forgotten, it will produce the above 2 buffer overflow, shortening, such as phase operation (trim), so after this, need memory redistribution to release excess portion of the space, what will happen to forget memory leaks;Copy the code

Because memory redistribution involving complex algorithm, and may need to perform a system call, it is usually a more time-consuming operation Redis as a database, if every time to memory allocation affects performance SDS a field is unused free space, through which can achieve space pre-allocated and inertia space released two optimization strategies

Space preallocation

The space pre-allocated user optimizes the SDS string growth operation. When the API makes changes to an SDS that require more space, the program allocates additional unused space to the SDS — if the SDS is less than 1M in length (the value of the LEN attribute) after the changes, the program allocates the same amount of unused space as the LEN attribute. For example, if len grows to 13 bytes, the pre-allocated memory size is 13+13+1 = 27(the extra 1 byte is an empty string) — if SDS is longer than 1M; Then 1M unused space will be allocated; For example, after increasing the SDS length to 30M, then allocate 1M unused space 30M+1M +1byte; If it grows larger than 1M, at most 1M unused space will be pre-allocated.

With a preallocation strategy, Redis can reduce the number of memory reallocations required to perform successive string growth operations

Inert space release

Lazy space frees are used to optimize SDS string shortening operations. When the API needs to shorten strings, the program does not immediately use memory reallocation to reclaim excess bytes. SDS, however, provides an API that allows us to actually free up SDS unused space when we need it, so there’s no need to worry about lazy space free strategies causing memory waste!

4. Binary security

C string must comply with some coding (such as ASCII), and, except for the end of the string inside cannot contain null character ‘\ n’, otherwise the first by the program reads in an empty string will be mistaken for a string at the end, these restrictions make C string can only save the text data, and can’t save as pictures, audio, video, and so on binary files.

SDS does not have this problem. Redis’s BUF array is used to hold a series of binary data.

conclusion


1. When does Redis use C strings?

In Redis, C strings are only used as string literals where string values do not need to be modified, such as printing logs.

2. The difference between SDS and C strings

①. C string gets string length complexity O(N); SDS is O(1) because SDS has a special len attribute; (2) C strings may have buffer overflow, but SDS does not, because THE SDS API automatically allocates memory ③.C strings need to be re-allocated every time they grow and shorten, while SDS has its own optimization strategy: space pre-allocation and lazy space release ④.C strings can only store text data, while SDS stores binary data