Simple Dynamic Strings (SDS)
Redis is implemented in C language. Instead of using the traditional string representation of C language, Redis uses a self-constructed abstract type of Simple Dynamic String (SDS), which is used as the default string representation of Redis. The convention of ending ‘\0’ in C is still adhered to.
In addition to holding string values, the buffer is also acted on: the AOF buffer in the AOF module. The input buffer in the level 1 client state is implemented by SDS.
1. Structure of SDS:
** Note: ** In saving null character (‘\0’ character) space does not count in SDS len attribute, allocate 1 extra byte space for null character. The operations from null characters to the end of the string are performed automatically by the SDS function. The advantage of following this convention is that SDS can directly reuse some of the functions in the C character creation library.
SDS and C strings
1. Constant complexity Gets the length of the string
A C string does not record its length. To obtain the length of a C string, the entire string must be traversed. The complexity of the whole operation is O(N).
SDS records the length of the SDS string itself in the len attribute, and the operation complexity of obtaining the length is only O(1). SDS length manipulation is done automatically by the API without any manual length modification.
2. Prevent buffer overflow
C strings tend to cause buffer overflow. For example, when C strings are concatenated, the source string length is not allocated enough memory, which can cause a buffer overflow problem.
SDS determines whether the source string has enough memory before concatenating the string. If not, enough space is allocated, eliminating the possibility of buffer overflows.
3. Reduce the number of memory reallocation times caused by string modification
The C string underlying implementation is always an array of N+1 length (an extra character space is used to hold null characters). Before performing the concatenation operation, you need to expand the space of the underlying array through memory reallocation; otherwise, a buffer overflow will occur. If truncation is performed, memory needs to be reallocated to free up space that is no longer used by the string, otherwise a memory leak will occur.
The length of the BUF array in SDS is not necessarily the number of characters plus one, the array can contain unused bytes, and the number is stored in the free attribute.
3.1 Space preallocation
String growth operations to optimize SDS: When SDS expands space, the program allocates not only the required space to the SDS, but also additional unused space.
1. If the length of SDS (len's attribute value) is less than1MB, allocates the same amount of unused free space as Len. The actual length is len + free +1byte2If the length of SDS is greater than or equal to1MB is allocated1MB of unused space.Copy the code
With a space preallocation strategy, Redis can reduce the number of memory reallocations required to perform consecutive string growth operations. SDS reduces the number of memory reallocations required to grow a string N times in a row from a certain N to a maximum of N.
3.2 Lazy space release
The user optimizes the shortening of SDS strings: when SDS does space shortening, the program does not immediately use memory reallocation to reclaim the shortened extra bytes, but uses the free property to record the extra bytes for future use.
A lazy release strategy, SDS avoids the memory reallocation operations required when shortening strings and provides optimization for future growth. You can actually free up unused space without worrying about memory waste.Copy the code
4. Binary security
Avoid C string, read until null character (‘\0’) end read problem. The BUF array in SDS structure is processed in binary mode, which stores columns of binary data. The len attribute value is used in SDS to determine whether the string ends. Redis can store not only text data, but also binary data in any format.
5. Compatible with some C character functions
Following the requirements of C characters, some functions in C can be reused, avoiding unnecessary code duplication.