Redis is commonly used for distributed KV cache, many people just use it, but do not know that there are a lot of hidden secrets at the bottom.

Type String

String is the most basic data type supported by Redis. First, let’s take a look at String, its data structure and storage.

Redefine SDS to store strings

As we all know, Redis is written in C language, and C language is no String type, only char[], and in the initialization time must specify the size of the type cannot be changed. In order to achieve Dynamic addition and extension functions, such as INCR command and append command, Redis defines and maintains a SDS (Simple Dynamic String) to achieve these functions.

Let’s take a look at the data structures defined in the Redis source code. There are five types to save space.

Len (char[]) time complexity O(n); 2, alloc: c does not have a String type, only char[], and char[] must allocate the space length first, char[] must allocate the space length first, char[] must be expanded after the data grows.

3. Falgs: always occupies a byte. The lowest three bits represent the type of the header. There are five types of headers, defined as constants in SDS.h. 4, BUf [] : C char arrays, ending with ‘\0’, means that storing binary data cannot contain ‘\0’. Images, audio, etc., can be problematic if stored in binary — that’s why Redis says its SDS is a binary safe string.

SDS improvement on c original CHAR array

SDS implemented by Redis supports capacity expansion 2. Contains length len, and obtains length complexity O(1) 3. Space pre-allocation 4.

Advantages and disadvantages of SDS

advantages

  • Capacity expansion
  • Contains length len, get length complexity O(1)
  • Space preallocation

disadvantages

  • Additional memory needs to be allocated
  • Efficiency problems with frequent allocation and collection

Redis uses the memory allocation library Jemalloc

Jemalloc allocates memory to a power of 2 that is larger than N but closest to N, according to the number of bytes we request. This reduces the number of frequent allocations. Let me give you an example. If you ask for 6 bytes of space, Jemalloc actually allocates 8 bytes of space; If you request 24 bytes of space, Jemalloc will allocate 32 bytes. So, in the scenario we just described, the dictEntry structure takes up 32 bytes.

Space preallocation

Space preallocation is used to optimize string growth operations for SDS: When the SDS API modifies an SDS and requires space expansion for the SDS, the program allocates not only the space required for the modification, but also additional unused space for the SDS.

Where, the amount of extra allocated unused space is determined by the following formula:

  • If the length of SDS (the value of len) is less than 1 MB after modification, the program allocates the same amount of unused space as len, and the value of SDS LEN will be the same as the value of free. For example, if len of SDS becomes 13 bytes after modification, then the program allocates 13 bytes of unused space, and the actual length of the BUF array of SDS becomes 13 + 13 + 1 = 27 bytes (an extra byte is used to hold null characters).
  • If the SDS is greater than or equal to 1 MB in length after modification, the program allocates 1 MB of unused space. For example, if len of SDS becomes 30 MB after modification, the program allocates 1 MB of unused space, and the actual length of the BUF array of SDS will be 30 MB + 1 MB + 1 byte.

With a space preallocation strategy, Redis can reduce the number of memory reallocations required to perform consecutive string growth operations.

Inert release

Lazy space free is used to optimize SDS string shortening operations: When the SDS API needs to shorten the string saved by SDS, the program does not immediately use memory reallocation to reclaim the shortened extra bytes, but instead uses the free property to record the number of bytes and wait for future use.

Redis KV storage structure

In Redis, all storage is stored in the form of KV key-value pairs, K is a string type, which is SDS; V may be strings, lists, hash, etc. (Data structures supported by Redis). V is not directly defined as a specific type, but encapsulated in a layer of redisObject. The actual stored data structure is specified by the PTR pointer.

In addition, in order to save more space, Redis also has different ways to store PTR Pointers. On the one hand, when storing Long integers, Pointers in RedisObject are directly assigned to integer data, so that there is no need for additional Pointers to point to integers, saving the space overhead of Pointers. On the other hand, when string data is stored and the string is less than or equal to 44 bytes, the metadata, Pointers, and SDS in a RedisObject are a contiguous area of memory, thus avoiding memory fragmentation. This layout is also known as EMBSTR encoding. Of course, when the string is larger than 44 bytes, the amount of data in SDS begins to grow. Redis no longer arranges SDS and RedisObject together, but allocates independent space for SDS and points to SDS structures. This layout is called raw encoding. As is shown in

  • Embstr encodes short strings, one memory allocation; It is read-only, and if you modify the content, it becomes RAW (even if it does not exceed 44 bytes);
  • Raw encoding allocates memory space multiple times, storing strings longer than 44 bytes.

If raw native SDS character length is reduced to less than 44, will it reverse into EMBSTR encoding? Not; Redis underlying code, irreversible after conversion (no rollback).

conclusion

Redis is a common cache middleware, we must understand its data structure and storage, in order to use the time to choose a more appropriate data structure and memory estimates.

Redis memory computing address www.redis.cn/redis_memor…