What is a compressed list

A compressed list in Redis is a sequential data structure consisting of a series of specially encoded contiguous memory blocks. Each node in the list can store either a byte array or an integer value.

It exists mainly to save memory.

Application scenarios of compressed lists

Compressed lists are mainly used as one of the underlying implementations of the List and Hash data structures in Redis.

In a List List, if the List has a small number of stored elements and each element is a small integer value or short string length, the List uses the compressed List structure as the underlying implementation.

In Hash, if the number of stored key-value pairs in the Hash is small and the keys and values of each key-value pair are small integer values or short strings, then the Hash uses the compressed list structure as the underlying implementation.

Compress list structure content

An outline of the structure of the compressed list:

Attribute zlbytes

The zlBytes attribute is 4 bytes long and holds the total number of bytes consumed by the entire compressed list.

This is used when reallocating memory for a compressed list or calculating the value of the Zlend attribute.

Attribute zltail

The zltail property takes up four bytes and holds the number of bytes from the end of the compressed list to the start of the compressed list.

With this attribute value, the program can directly determine the position of the last node of the table in O(1) time complexity without traversing the entire compressed list.

Attribute zllen

The zltail property takes up two bytes and holds the number of compressed list nodes.

Note that when the value of zllen is less than UINT16_MAX (65535), this value is the true number of nodes in the compressed list; when it is greater, the true number of nodes needs to be calculated by traversing the entire compressed list.

Attribute zlend

The zlend attribute takes up 1 byte and is used to indicate that it is at the bottom of the compression list.

Attributes entry

The attribute Entry is used as the node of the compressed list. We will describe the compressed list node entry in detail.

Compress the list node structure content

Compressed list node structure abstract code:

typedef struct zlentry {    

    unsigned int prevrawlensize, prevrawlen;   
    
    unsigned int lensize, len;    
    
    unsigned int headersize;    
      
    unsigned char encoding;   
       
    unsigned char *p;
    
} zlentry;
Copy the code

This is an abstract compressed list node structure, the actual storage is not like this structure, but it is designed to be easy to access related operations.

Properties prevrawlen, prevrawlenSize

The prevrawlen property is the length of the front node, and prevrawlenSize represents the size of the bytes required to store the prevrawlen property

Attributes len, lensize

The len attribute is the length of the current node value, and lensize represents the size of the bytes required to store the Len attribute. (Note: Len is the length of the string if the node is storing a string, or the length of the byte if the node is storing an integer value)

Attribute headersize

The property headerSize is the length of the current node header, equal to prevrawlenSize + LenSize

Attributes encoding

Attribute encoding Specifies the encoding type used for the current node value

Properties of p

Property p is a pointer to the memory address of the current node

Compressed list node concrete storage structure

The structure of a compressed list node looks like this:

previous_entry_length

The previous_entry_length property holds the length of the front node.

  • If the length of the front node is less than 254 bytes, the previous_entry_length will be 1 byte, and the length of the front node will be stored in this byte.
  • If the length of the current node is 254 bytes or greater, the length of previous_entry_length will be 5 bytes, the first byte of these 5 bytes will be set to 254, and the remaining 4 bytes will hold the length of the preceding node.

The program can mainly through the length of the front node, and according to the start address of the current node to calculate the start address of the front node. The traversal of a compressed list from the top to the bottom of the table is achieved through this principle.

encoding

The encoding property records the type and length of the data stored in the corresponding node’s Content property.

The length of encoding can be one, two, or five bytes. The number of bytes depends on the top two digits of the encoding value. When the top two digits start with 00, 01, and 10, content stores a string and encoding is one, two, or five bytes long, respectively. When it starts with 11, content stores integers and encoding is one byte in length.

At the beginning of 00:

Encoding Occupies one byte of space, indicating that content can store strings of less than or equal to 2^6-1 bytes

At the beginning of 01:

Encoding Takes up two bytes of space, indicating that content can store strings of less than or equal to 2^14-1 bytes

At the beginning of 10:

Encoding takes up five bytes of space. The last six bits of the first byte are padded with zeros, and the rest indicates that content can store strings of up to or equal to 2^32-1 bytes

11 At the beginning:

Encoding takes up one byte of space, and content stores a different range of integer values depending on the encoding

  • Encoding value: 11000000. Content Can store int16_T integers
  • Encoding value: 11010000, content can store int32_t integers
  • Encoding value: 11100000, content can store int64_t integers
  • Encoding value: 11110000, content can store 24-bit signed integers
  • Encoding value: 11111110, content can store 8-bit signed integers
  • Encoding value: 1111xxxx, content can store values between 0 and 12, no explicit type
content

The content attribute holds the value of the compressed list node, which can be either a string or an integer value. The type and length of the value are determined by the Encoding attribute value

Compressed list of cascading updates

If, in a compressed list, there are multiple consecutive nodes E1 through eN, each of which is between 250 and 253 bytes long, this property will record the length of each leading node with a value of one byte, plus the previous_entry_length mentioned above.

When we set a new node with a length of 254 bytes or more to the head of the compressed list, the new node will be the first node of the E1 node, and the e1 node’s previous_entry_length value is only 1 byte. The program does not store the five bytes that represent the length of the front node, so it allocates space for E1 to store four more bytes, and then E2 needs more space to store E1’s length, and so on, causing all subsequent nodes to need to update space, which is a chain update.

Chained updates cause the compressed list to be reallocated N times in the worst case, and the complexity of each reallocation is O(N), so the worst time complexity of chained updates is O(N^2).

However, in this case, only when exactly multiple continuous nodes are between 250-253 bytes can chain update occur, which is very unlikely. If it is between several consecutive 250-253 bytes, the performance will not be affected.

conclusion

First of all, it describes the meaning and application scenarios of Redis compressed list, and understands the data structure of compressed list mainly to save memory. Then, it has a detailed understanding of its data structure and its main chain update operation, and has a deep understanding of the bottom of compressed list.

Reference: Redis Design and Implementation

For more Java backend development related technologies, you can follow the public account “Red Orange ah”.