This blog reference: Nuggets Redis booklet Ao Bing
If an interviewer asks you why single-threaded Redis are so fast, you might blurt out, because single-threaded, avoids context switching; Because based on memory, much faster than hard disk read and write; Because of the use of multiplexing network model. Whether you actually understand it or not, this is a good answer for more than half the interviewers, but it would be nice to add: Because Redis has carefully designed various data structures, such as String using SDS, list using Ziplist, QuickList and so on, it may be more brilliant, at least it can say part of the interviewer is not clear about things. Today we’ll take a look at the mystery of the String data structure that is most commonly used in Redis.
Start with bit operations
Bitmap many application scenarios, such as the well-known bloom filter (before the blog has introduced: the vernacular bloom filter “), such as statistical specified user login any date within a year, arbitrary statistical date, all user login, etc., can use a bitmap to achieve (before the blog also has introduced: Long blog: Redis is more than just get Set), so it’s worth taking a good look at Bitmaps, but this blog isn’t going to go into detail about bitmaps, and bitmaps are all about bitmaps.
If we were to insert a key with value “hello” into Redis, everyone would:
test:1>set key hello
"OK"
test:1>get key
"hello"
Copy the code
What if we were to implement this requirement using bitwise operations? What, DID I hear that right, that bitwise operations can fulfill this requirement? Sure, because in Redis, strings are stored in byte arrays.
What, you don’t believe me? Please read on.
To implement this with bitwise operations, we get the ASCII code for “Hello” and then compute the binary:
For example, the ASCII code for “h” is 104 and the binary code is 1101000:
The ASCII code for “e” is 65, binary is 101, and binary is 1100101:
The following bitmap is then formed:
Here we need to use bit operations to set:
test:1>setbit s 1 1
"0"
test:1>setbit s 2 1
"0"
test:1>setbit s 4 1
"0"
test:1>setbit s 10 1
"0"
test:1>setbit s 13 1
"0"
test:1>setbit s 9 1
"0"
test:1>setbit s 15 1
"0"
Copy the code
The order of setbits can be adjusted at will, as long as the resulting bitmap looks like this. (I adjusted the order of the seitbits here, ok, I admit that I typed it wrong and didn’t bother to type it again, the bitmap is the same anyway).
Then we get:
test:1>get s
"he"
Copy the code
It’s amazing, isn’t it, but it also shows that at the bottom of Redis, a String is an array.
SDS
String is the most widely used in any programming language or storage engine. However, String may have different implementations in different programming languages or storage engines. In Redis, the bottom layer of String is SDS, which stands for Simple Dynamic String.
Redis is developed in C language. C language does not have ready-made string types, but uses character arrays to represent strings. Why not directly do this, but “unique” to build their own SDS data structure to achieve strings?
Let’s start with what SDS is:
struct sdshdr {
int len;
int free;
char buf[];
};
Copy the code
The definition of SDS is relatively simple, with only three fields, and it can be seen literally:
- Len: Stores the actual length of the string
- Free: Free storage space
- Buf [] : stores actual data
As you can see, the SDS structure also contains byte arrays, but with two new fields.
For convenience, the following articles refer to strings represented directly by byte arrays as C strings, but remember that there is no ready-made string type in C.
Let’s look at the differences between SDS and C strings:
- Finding the length of a string
C language string, the length of the string can only be traversed, time complexity is O(N), single-threaded Redis means yali Mountain large, but now the introduction of a field to store the actual length of the string, time complexity is suddenly reduced to O(1).
- Binary security
In C language, reading string follows the “zero stop”, that is, reading string, when read “\0”, it is considered to have read the end, even if there is a string behind it will not read, such as pictures, audio and other binary data, often interlaced with “\0” in them, good pictures, audio will be destroyed… But now that you have a field to store the actual length of the string, when you read the string, look at the length of the string first, and then read the following number of bits.
- Buffer overflow
String concatenation is common operation in the development, C language string is not to record the length of the string, once we call the splicing function, and did not calculate in advance good memory, and then by the appearance of buffer overflow, but now the introduction of the free field, to record the rest of the space, do together before operation, first to see how much the remaining space, if enough, Then rest assured to do the splicing operation, not enough, on the expansion.
- Reduced memory reallocation times
- Space preallocation: When concatenating a string, Redis is thoughtful enough to allocate a certain amount of free space. The free space now seems a bit wasteful, but if we continue concatenating, the free space will come into play.
- Lazy free space: When we do string reduction, Redis does not immediately reclaim space, because you may have to concatenate the string again. If you do it again and still do not use the space, Redis will also reclaim the space.
Expansion strategy
If the value is smaller than 1 MB, the system uses a double capacity expansion policy, that is, allocate 100% more remaining space. If the value is larger than 1 MB, only 1 MB more remaining space is allocated each time.
The maximum length
Redis specifies that the string length cannot exceed 512 megabytes.
embstr raw
Redis strings can be stored in two ways: embstr and RAW. When the length is <=44, embstr is used to store the string:
set codebear abcdefghijklmnopqrstuvwxyz012345678912345678 "OK" debug object codebear "Value at:0x7f4050476880 refcount:1 encoding:embstr serializedlength:45 lru:1999016 lru_seconds_idle:36"Copy the code
When length >44, use RAW instead:
set codebear abcdefghijklmnopqrstuvwxyz0123456789123456781
"OK"
debug object codebear
"Value at:0x7f404ac30100 refcount:1 encoding:raw serializedlength:46 lru:1999188 lru_seconds_idle:3"
Copy the code
There are some blogs on the Internet that say 39 is the dividing line. Why are there two answers? Keep reading and you’ll see.
Let’s start by looking at the Redis object header.
#define LRU_BITS 24
typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
* LFU data (least significant 8 bits frequency
* and most significant 16 bits access time). */
int refcount;
void *ptr;
} robj;
Copy the code
The Redis object header contains 4 bits +4 bits +24 bits +4 bytes +8 bytes (Pointers, in the 64-bit system, 8 bytes)=32 bits +12 bytes =4 bytes +12 bytes =16 bytes.
Let’s look at the differences between these two forms of storage:Embstr is stored in a compact form, with Redis object headers and SDS objects existing together (contiguously).
In general, in raw storage, Redis object headers and SDS objects do not exist together (discontinuous).
The size of a block of memory is 64 bytes.
Now that the pre-introduction is complete, let’s take a look at the definition of SDS for HDD 3.0.
struct sdshdr {
unsigned int len;
unsigned int free;
char buf[];
};
Copy the code
The header of the Redis object takes up 16 bytes, len and free of the SDS object take up another 8 bytes, 64-16-8=40, and the saved string ends with \0, which takes up another 1byte. Therefore, the actual stored string can only be <=39 bits, so under the earlier version of Redis, The boundary between EMbstr and RAW is 39.
Taking a look at the definition of SDS for Redis5.0,To view:You can see it’s a big change. Why make that big change? If the string length is small, sdSHdr8 is used to store the string. Len and alloc use 2byte, flags use 1byte, \0 end uses 1byte, total 4byte, 64byte-16byte (object header) -4byte=44byte. So in higher versions of Redis, the dividing line between EMbstr and raw is 44.
The String that Redis uses a lot involves a lot of things, and these things can distinguish mediocre development from good development, and become a good development, there is a lot to learn.