preface
SDS (simple dynamic string) is the most basic data type of Redis, so what is SDS string? Why is Redis not directly represented by C strings? Why is SDS more efficient than C strings
Let’s get straight to those questions
What is SDS
SDS (Simple Dynamic String) stands for simple dynamic String and is the default string representation of Redis
C strings use an array of characters of length N+1 to represent strings of length N, and the last element is always a null character ‘\0’ (SDS follows the convention that C strings end with a null character, that is, SDS last byte holds the null character ‘\0’). However, the strings used by C language cannot meet the requirements of Redis on the security, efficiency and function of strings, so WE build SDS by ourselves
There are two versions of SDS, this is the previous version of SDS:
typedef char *sds;
struct sdshdr {
// Records the number of bytes used in the BUF array, i.e. the length of the string held by SDS
unsigned int len;
// Records the number of unused bytes in the BUF array
unsigned int free;
// An array of bytes to hold strings
char buf[];
};
Copy the code
Of course, the len attribute here does not contain a null character at the end of an SDS (such as an SDS with unused space) :
Later versions of SDS define different structures for different length ranges, but generally the structures are the same: len for the used length, alloc for the allocated length of the BUF array, Flag for the SDS type, and buF for the byte array
SDS and C strings
1. Obtain the time complexity of the string length
C string does not record its length, so in order to get the length of a C string, you have to traverse the entire string, so the time is O(n).
For SDS, the len attribute is simply accessed, so Redis reduced the complexity of obtaining string length from O(n) to O(1)
2. Prevent buffer overflow
A C string with no record of its length can easily cause a buffer overflow. For example, a C string SRC is concatenated directly to the end of another string, dest. If dest does not preallocate enough memory, this will cause a buffer overflow
As for SDS, since it has the attribute of free, it should check whether the length is enough before performing the splicing operation. If it is possible, it should directly splice; otherwise, it should expand the space of SDS first and then splice
3. Reduce the number of memory reallocation times caused by string modification
With C strings, every time you grow or shorten it, you need to reallocate the array of C strings, such as concatenation of C strings, you need to expand the size of the underlying array through memory redistribution to avoid buffer overflow, or truncation of C strings, Memory reallocation is needed to free up unused parts of the string to avoid memory leaks
SDS can realize two optimization strategies of space pre-allocation and lazy space release through len and free attributes
- Space preallocation
Space preallocation is used to optimize string growth operations for SDS: When space expansion is required for SDS, it not only allocates the necessary space for modification, but also allocates additional unused space for SDS
If the modified SDS is less than 1MB, the value of the free attribute is the same as the value of the len attribute
If the modified SDS is greater than or equal to 1MB, 1MB of unused space is allocated
You can see that with this preallocation strategy, SDS reduces the number of memory reallocation required to grow strings N times in a row from a certain N to a maximum of N
- Inert space release
Lazy space free is used to optimize string shortening operations on SDS: Shortening operations on SDS are recorded with the free property instead of using memory reallocation to reclaim the extra bytes
Of course, there are apis to actually free up SDS unused space if needed
4. Binary security
Characters in C must conform to some encoding (such as ASCII), and strings cannot contain null characters, making C strings only hold text data
The API of SDS is binary safe, and will process the data of BUF array in SDS in the way of binary processing. Therefore, SDS can not only store text data, but also store binary data of any format
The encoding method of SDS
The encoding of a string object can be int, RAW, or embstr
If a string object holds an integer value that can be represented as long, the string object stores the integer value in the PTR property of the string object structure (converting void* to long) :
redis> SET number 666
OK
redis> OBJECT ENCODING number
"int"
Copy the code
Here’s the redisObject structure
All Redis objects have a Redis object header structure
Different objects have different types of type, and types of the same type have different encoding
Each object has a reference count refCount, and when the reference count reaches zero, the object is destroyed and memory is reclaimed. The PTR pointer points to the exact location where the object content (body) is stored
struct RedisObject { // It takes 16 bytes
int4 type; / / 4 bits type
int4 encoding; / / 4 bits coding
int24 lru; // 24bits Records LRU information
int32 refcount; // 4bytes reference count
void *ptr; // 8bytes, 64-bit system
} robj;
Copy the code
If the string object holds a string value and the string value is longer than 39 bytes, the string object holds the string value using SDS and sets the encoding of the object to RAW:
redis> SET sentence "I was forced to 996 and I just felt tired after work"
OK
redis> STRLEN sentence
(integer) 52
redis> OBJECT ENCODING sentence
"raw"
Copy the code
If the string object holds a string value with a length of 39 bytes or less, the string object will use embSTR encoding to hold the string value
Embstr is an optimized encoding for short strings. Instead of raw encoding, which calls memory allocation functions twice to create redisObject and SDSHDR structures, Embstr encodings allocate a contiguity of space by calling the memory allocation function once and include redisObject and SDSHDR structures
In addition, all the data of embSTR-encoded string objects is stored in a contiguous memory, so it is better to take advantage of the advantages of caching
redis> SET msg "hello"
OK
redis> OBJECT ENCODING msg
"embstr"
Copy the code
If you want to save a floating-point number into a string object, you will first converts the floating-point number to a string value, and then save the transformation of the proceeds of the string value, if the need of the floating-point calculations, will be stored in a string object in the string value into a floating point value, perform operations to convert the floating-point back again after a string value and save the inside
In addition, embstr-encoded string objects are actually read-only (because no modification program is written for embstr-encoded string objects). If any modification commands need to be executed on an EMbstr-encoded string object, the program converts the encoding from EMbstr to RAW, and then executes the modification commands