No matter what programming language you’re using, strings are probably the most frequently used everyday. You can say that string objects are fundamental and important.

So today I’d like to talk to you about the implementation of Redis string. Let’s take a look at this seemingly simple string. Why is it so difficult to implement?

After reading this article, you can learn:

  • Redis string object multiple data structures
  • Underlying data structure transformation relationships
  • SDS Dynamic string
  • Redis is implemented in C, so why not use C strings directly?

Redis object

Whenever we save a new key-value pair in Redis, Redis creates at least two objects, one for the save key and one for the save value.

Each object in Redis is a redisObject structure with three properties:

  • Type: indicates the type of the object
  • Encoding: Indicates the encoding
  • PSTR, a pointer to the underlying data structure

Type indicates the object type. Currently, the following types can be used:

  • String object
  • A list of objects
  • The hash object
  • A collection of objects
  • Ordered set object

We can use the TYPE directive of Redis to check the value object TYPE of the current key.

The Redis key object is simple to say, it is always a string object, while the value object is more complex, it can be a string, can be a collection, can be a dictionary, etc.

Each object type is implemented with multiple data structures, and encoding stands for the underlying data structure type:

We can use Redis’s OBJECT ENCODING directive to view the underlying ENCODING of a current key value OBJECT:

Each object type may support multiple data structures, with the following relationships:

Redis string object

Redis string objects support three underlying data structures, which are (used below)

OBJECT ENCODING

Output) :

  • int
  • embstr
  • raw

Int represents an integer of type long, and will be used as long as the string object holds an integer value.

Note that only integers use ints. If they are floating point numbers, Redis internally converts the floating point number to a string value before saving it.

The underlying data structure of EMBSTR and RAW type is actually SDS (simple dynamic string). Redis defines SDSHDR as a structure.

The difference is that the embSTR type will call the memory allocation function to allocate a contiguous memory space containing the redisObject and SDSHDR data structures in turn.

The RAW type will call the memory allocation function twice to allocate two memory Spaces, one containing the redisObject structure and the other containing the SDSHDR structure.

SDS simple dynamic string

Let’s look at simple dynamic strings.

sdshdr

The underlying structure of SDS contains three attributes:

  • Len, used to record the length used by the following buf[] array, which is equal to the length of the string held by SDS
  • Free, which records the unused length of the following buf[] array
  • Buf [], the user saves the string

Note that the last byte in the buf[] array will hold a null character \0, indicating the end of the string. The len length in SDSHDR is a null character that does not contain this.

This is the same design as strings in C.

SDS scale

When we grow the SDS string, if the SDS space is insufficient, THE SDS will expand first and then modify the operation.

Assume that the current SDS value is as follows:

Now execute a string concatenation instruction,

APPEND sds " Cluster"
Copy the code

Since the total length of strings is 13 after the stitching is completed, the remaining space of SDS is insufficient, so SDS will expand and perform a memory reallocation again.

Because of our example above, the SDS modified length (the len attribute value) is less than 1MB, the program will allocate the same amount of unused space as len, which is the same as the free attribute value in SDS.

Now the actual length of the SDS BUF array is:

13 + 13 + 1 = 27Copy the code

If SDS changes the length to greater than 1MB, Redis will allocate 1MB of unused space.

Assuming that SDS LEN is 20MB after modification, the program will allocate 1 MB of unused space, and the actual length of the SDS BUF array is:

20MB+1MB+1byte
Copy the code

Following the above example, if we execute the string concatenation instruction again:

APPEND sds " Guide"
Copy the code

SDS is not using enough space to hold the concatenated string this time, so there is no need to reallocate memory this time.

SDS inert space free

When we shrink the string, the SDS string value will be shortened so that the SDS free space will be larger.

Instead of immediately using the memory allocation function to reclaim the excess space, the program uses the free attribute to record the excess space for later use.

By using this lazy space free strategy, SDS avoids the memory reallocation operations required for string shortening and can use the excess space directly after SDS string growth.

SDS also has an API that actually frees up unused space for SDS, so you don’t have to worry about wasting memory with lazy release strategies.

Why doesn’t Redis just use C strings?

Redis uses C programming language at the bottom, so in fact C language also has strings, Redis can reuse C language strings ~

So why did Redis go behind closed doors and redesign an SDS data structure?

This is because the simple string form of reason C does not meet the requirements of Redis in terms of security, efficiency and function.

First, if we need to get the length of a C string, we can use the following function:

strlen(str)
Copy the code

The strlen function actually iterates, counting each character until it encounters a null character that represents the end of the string.

This operation takes O(N) time.

SDS can be read directly because len records the length of the current string, and the time complexity is only O(1).

This ensures that fetching string lengths does not become a Redis performance bottleneck.

The second point is that each increment or shortening of a C string will cause memory to be reallocated. Otherwise, it may cause a buffer overflow or a memory leak.

Because SDS uses the space preallocation strategy, if SDS increases continuously N times, the number of memory reallocation is reduced from certain N times to maximum N times.

Third, C strings cannot contain null characters, otherwise the first null character will be mistaken for the end of the string, which greatly limits the scenarios used.

SDS does not have this problem because you can use the value of the len attribute to determine whether the string ends.

So SDS strings are binary safe.

String object underlying data structure conversion

SDS structure requires extra attribute record length and unused length, although it reduces the complexity of the system and improves the performance, it still pays the corresponding price, that is, there is a certain amount of memory space waste.

So Redis string object underlying structure is not all SDS.

If the string object holds an integer value, the integer value is stored directly in the PTR property of the string object structure.

If the save is not an integer, but the string length is less than or equal to 39 bytes, the string will be encoded using embstr. In the end, raw coding is used when both of the above are not met.

In addition, int encoding and EMbstr will also be converted to raw encoding under certain conditions.

If you append the underlying integer so that it is no longer an integer, the encoding of the string object changes from int to RAW.

The embstr encoding is actually read-only because Redis doesn’t provide any programs to modify the embstr encoding object.

So once we modify the embstr encoding string, the encoding of the string object will change from EMbstr to RAW.

conclusion

Redis strings are the most frequently manipulated data structures, and it is important to understand the underlying string objects.

The underlying structure of Redis string object can be divided into integers and SDS. When we save integers in the operation set command, we will directly use integers.

SDS is another structure of Redis internal definition. Compared with the C language string, it can quickly calculate the length of the string and does not need frequent memory allocation. Finally, A little SDS is a binary safe string.

Although SDS design brings many advantages, compared with C language, this structure occupies at least 4(Len)+4(free)=8 bytes more memory. If buF [] array still has unused space, space waste is more serious.

So Redis strings are not a panacea and can take up a lot of memory in some scenarios. But if you use a different data structure, you might use less memory.

So we must use the right data organization for the right scenario.

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

  1. Like, forward, have your “like and comment”, is the motivation of my creation.

  2. Follow the public account “Java rotten pigskin” and share original knowledge from time to time.

  3. Also look forward to the follow-up article ing🚀

  4. [666] Scan the code to obtain the learning materials package