1. Give me an example
-
demand
- Record the image ID (10 digits) and the ID (10 digits) when the image is saved in the storage system
-
Analysis of the
-
The String type provides “one key for one value of data” in a format that matches \
-
A String can hold a binary stream of bytes, just like a “snake oil” can hold \ by converting the data into a binary byte array
-
-
The use of the string
- It saved 100 million images and used about 6.4GB of memory
-
There is a problem
- Problem with large memory Redis instances being slow to respond due to RDB generation \
-
The analysis reason
- Strings consume a lot of memory to store data
-
The solution
- Do not use string storage
- Switching to a more storage-efficient list, the collection type has a very memory saving underlying implementation structure,
-
The new problem
- The collection type stores data in a mode where a key corresponds to a series of values. It is not suitable for storing single-valued key-value pairs directly
2. Why is String memory expensive?
-
Why do 100 million images take about 6.4GB of memory 6.4GB/ 100 million is about 64 bytes
-
A record of a set of picture ids and their storage object ids actually requires only 16 bytes (2 long types 8+8=16)
-
The String type also requires additional memory space to record data length, space usage, and other information, also known as metadata \
- When the data is small, metadata takes up a lot of space
-
String storage mode
-
String stores it as an 8-byte Long integer, which is often called an int encoding
-
When the data contains characters, the String type uses the simple dynamic String SDS\
-
Buf byte array that holds the actual data \
-
Len is 4 bytes and represents the used length (extra overhead) \ of buF
-
Alloc also takes up 4 bytes and represents the actual allocated length of buF, which is generally greater than Len (extra overhead) \
-
-
There is also an overhead \ from the RedisObject structure
-
8 bytes of metadata \
-
8-byte pointer to actual data \
-
-
long\
- If the value is an integer of type Long, the pointer in the RedisObject is directly assigned to the integer data, thus saving the space overhead of the pointer
-
embstr\
- On the other hand, when string data is stored and the string is less than or equal to 44 bytes, the metadata, Pointers, and SDS in a RedisObject are a contiguous area of memory, thus avoiding memory fragmentation
-
raw
- When the string is larger than 44 bytes, the amount of data in SDS begins to increase. Redis no longer arranges SDS and RedisObject together, but allocates independent space to SDS and points to THE SDS structure. This layout is called raw encoding
-
-
Because the 10-digit picture ID and picture storage object ID are integers of type Long, they can be stored directly with int-encoded RedisObject
-
The RedisObject metadata portion of each int is 8 bytes, and the pointer portion is directly assigned to an 8-byte integer (8+8) *2=32 bytes \
-
Each entry in the hash table is a dictEntry structure that points to a key-value pair \
-
The dictEntry structure has three 8-byte Pointers to key, value, and the next dictEntry, with a total of 24 bytes \
-
The last 8 bytes are allocated because the Jemalloc allocation allocates multiples of 2
-
-
There were only 16 bytes of valid information, but 64 bytes were stored
\
3. What data structure can save memory?
-
Redis has an underlying data structure called ziplist, which is a very memory saving structure \
-
Compressed list \
-
Zlbytes List length \
-
Zltail the offset \ at the end of the list
-
Number of entries in the zllen list \
-
-
Metadata for each entry \
-
Prev_len indicates the length of the previous entry (1 byte or 5 bytes) \
-
Len: indicates its length, which is 4 bytes. \
-
Encoding: Encoding mode, 1 byte. \
-
Content: Saves actual data \
-
Entries are placed side by side in memory, eliminating the need for additional Pointers to concatenate, thus saving space taken up by Pointers (which also take up space) \
-
-
Analysis of the
-
Each entry holds an image storage object ID (8 bytes). In this case, only 1 byte is needed for prev_len of each entry
-
Redis implements collection types like List, Hash, and Sorted Set based on compressed lists. The big benefit of this is that it saves dictEntry overhead \
-
But when using the set type, a key corresponds to a set of data, can save a lot of data, but also only a dictEntry, which saves memory \
-
-
Existing problems
- \
\
4. How to use collection types to store single-valued key-value pairs?
-
When saving single-valued key-value pairs, you can use a Hash based second-level encoding method
-
We can store single-valued data in the Hash set by splitting the first part into the key and the second part into the value
-
Taking the image ID 1101000060 and the image storage object ID 3302000080 as examples, we can use the first 7 bits of the image ID (1101000) as the Hash key. The last three bits of the image ID (060) and the image storage object ID are used as the key and value\ in the Hash value, respectively
-
Images with the same first 7 bits are saved together, and other data are effectively distinguished by hash key (finally, the number is limited to 1000, which compresses the storage, but the search efficiency is very low).
-
-
By adding a record, the memory footprint increases by only 16 bytes \
-
The two underlying implementations of the Redis Hash type are compressed lists and Hash tables
-
The Hash type sets two thresholds for storing data in a compressed list. Once these thresholds are exceeded, the Hash type uses the Hash table to store the data
-
Hash-max-ziplist-entries: indicates the maximum number of hash elements saved in a compressed list ** **
- Number of written elements exceeded hash-max-ziplist-entries\
-
Hash-max-ziplist-value: indicates the maximum length of a single element in the hash set ** ** when saved in a compressed list
- The size of a single element written exceeds hash-max-ziplist-value\
-
-
We only use the last 3 bits of the image ID as the key of the Hash set, which ensures that the number of elements in the Hash set does not exceed 1000\
5. To summarize
-
However, when the saved key-value pairs themselves don’t take up much memory (such as the image ID and image storage object ID mentioned in this lesson), the metadata overhead of String type takes over
-
RedisObject structure \
-
SDS structure \
-
DictEntry structure
-