Memory mapped data structures

Memory-mapped data structures are a series of specially encoded sequences of bytes that typically consume much less memory to create than similar internal data structures, and can save users a lot of memory when used correctly. However, because memory-mapped data structures are encoded and manipulated in a much more complex way than internal data structures, they can consume more CPU time than similar internal data structures.

The integer set

An integer set is an abstract data structure Redis uses to store integer values of type INT16_t, INT32_t, or INT64_t, with no duplicate elements in the set.

The set of integers (intSet) is one of the underlying implementations of the set key: Redis uses the set of integers as the underlying implementation of the set key when a set contains only integer numeric elements and the set has a small number of elements.

The collection of integers is defined in the intset.h file

Specific definition:

typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;
Copy the code

The attributes are described as follows

  • Contents: Contents array is the low-level implementation of integer set. Each element of integer set is an array item of contents array. Each item is arranged in order according to the size of value from small to large, and the array does not contain any duplicate items.

  • Length: The number of elements in the integer set;

  • Encoding: contents The element type in the array, for example:

    • Encoding is INTSET_ENC_INT16, contents is an array of type int16_T, each item in the array is an integer value of type int16_T;
    • Encoding INTSET_ENC_INT32, contents is an array of type int32_T, each item in the array is an integer value of type int32_t;
    • Encoding is INTSET_ENC_INT64, contents is an array of type int64_T, each item in the array is an integer value of type int64_t;

Note: The contens array is declared int8_t, but the true type depends on the value of encoding

Upgrade of integer collection

Suppose we have a set of integers that are now storing integers of type INT16_t, and now we need to add an integer of type int32_t to it.

At this point, because C is a static language, we can’t put two different types of values in the same data structure, which would cause type errors.

Therefore, we need to change the original data structure to allow it to store larger integers, and change the original integer type to a larger type. This process is the upgrade of integer set.

Benefits of upgrading:

  • Since the set of integers can be automatically upgraded, we can store int16_t, INT32_t, int64_t and other types of integers in it without type errors.

  • Save memory, because without the upgrade, if we wanted to store different types of numbers in an array, we would have to define the array as int64_t at the beginning, and sometimes we only store int16_t in it, which would be a waste of memory. With the upgrade function, we can expand memory when needed, saving resources. Integer sets can only be upgraded, not degraded.

conclusion

  • The collection of integers is one of the underlying implementations of the collection key.
  • The underlying implementation of a collection of integers is an array, which holds the collection elements in an ordered, non-repetitive fashion. The program changes the type of the array as needed, depending on the type of the newly added element.
  • The upgrade operation brings operational flexibility to integer collections and saves as much memory as possible.
  • Integer sets only support upgrade operations, not degrade operations.

List of compression

Redis uses compressed lists as the underlying implementation of list keys when the keys and values of list items or hash keys are small integers or short strings.

The implementation of compressed list

  • Compressing lists, compressing, just to save memory.

  • It is a sequential data structure composed of a series of specially encoded contiguous memory blocks.

  • A compressed list can contain any number of nodes, each of which can hold either a byte array or an integer value.

Compressed list structure

A compressed list consists of the following sections: ZLBytes, Zltail, Zllen, N entries, and Zlend

nodes

Each object consists of previous_entry_length, encoding, and content.

Previous_entry_length: records the length of the previous node, in bytes.

use

  • You can calculate the start IP address of the previous node by subtracting the start IP address of the current node from the value recorded in this part.

  • Therefore, the principle of traversal from the end of the table to the top of the table is based on this, continuously calculating the start address of the previous node until it reaches the top of the table.

Encoding: Records the type and length of content.

  • The value contains 1, 2, or 5 bytes

    • Value with the highest bit being 00,01 or 10, indicating that content holds an array of bytes.

    • The array length is the record after removing the first two digits.

    • For example: 00001011, the length is 1 byte, the high two bits are 00, so the content store is a byte array, the last six bits 001011 table name content length is 11, for example: “WMLWMLWMLWW”

  • The length is 1 byte

    • The highest bit of the value is 11, indicating that Content holds an integer value. Length is to remove the two highest bits, other bits of the record.

Content: Holds the value of the node (byte array or integer)

Chain update

  1. Adding a node causes an update

If entry1 through entryN are less than 254 bytes, the previous_entry_length property of each node uses 1 byte to store the length of the previous node.

If entry1 creates a new node whose length is larger than 254 bytes, the original 1 byte length of entry1 cannot hold the length of new, and the space needs to be allocated to 5 bytes.

Similarly, entry2 is expanded as entry1 becomes 5 bytes, so that space is reallocated all the way to entryN. This is the chain update phenomenon.

  1. Deleting a node causes cascading updates

If the big node is larger than 254 bytes, small uses 1 byte to store its length, and the following N entries use 1 byte as above.

If small is removed, the previous entry1 node becomes big, which is larger than 254 bytes, so entry1 needs to be expanded by 5 bytes, and the same chained update operation as 4.1 occurs

Analysis:

In the worst case, the compressed list needs to be reallocated N times, and the worst complexity of each reallocation is O (N), and the worst complexity of chained updates is O (N²).

However, it is not uncommon for a list to consist of consecutive nodes between 250 and 253 bytes in length because of cascading updates.

And even if there is a chain update, there is little impact on performance if there are not many update nodes, and there are even fewer cases of overall update.

So you basically don’t have to worry about its performance.

Conclusion:

① Compressed list is a sequential data structure developed to save memory

② Compressed lists are used as the underlying implementation of list keys and hash keys. (Redis3.0 changed to QuickList) Ordered collections also use compressed lists.

③ A compressed list can contain multiple nodes, each of which can hold a byte array or integer value

(4) Adding a new node to the compressed list or deleting a few points from the compressed list may trigger a chain update operation, but such operation is not likely to occur. The worst time complexity of this operation is On^2