This is the 7th day of my participation in the August More Text Challenge

1. Introduction

The integer set intset is an ordered structure that stores integer data, used when the elements of the Redis collection type are all integers and are all in the range of 64-bit signed integers

In a collection of integers, the underlying encoding is converted from intSet to HashTable when the number of elements exceeds a certain number (512 by default) or when non-integer elements are added

2. Data storage

The set of integers can hold int16_t, INT32_t, and int64_t data without duplication

// intset.h
typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;
Copy the code
  • Encoding: Indicates the encoding type
    • INTSET_ENC_INT16: when the value is 2 and the element value is in [-2^15,2^15-1], the element occupies 2 bytes
    • INTSET_ENC_INT32: when the value is 4 and the element value is in [-2^31,-2^15) or (2^15-1,2^31-1], the element occupies 4 bytes
    • INTSET_ENC_INT64: used when the value is 8 and the element value is at [-2^63,-2^31) or (2^31-1,2^63-1), where the element occupies 8 bytes
  • Length: indicates the number of elements
  • Contents: An array that stores elements, specifying the number of bytes representing an element according to encoding

The intSet structure determines whether to expand the capacity according to the value of the element to be inserted. The expansion will modify the Encoding field, which determines how many bytes the element occupies in the contents array, so when the encoding field changes, The memory used to hold elements in the contents array also needs to be expanded

When an element is inserted to cause expansion, the value of the element is either the maximum value or the minimum value after it is inserted into intSet

If the encoding field of the element to be inserted is larger than the Encoding field of intSet, capacity expansion is required and the element must not exist in IntSet

3. Basic operations

3.1 Query Elements

Steps for finding elements:

  • First, determine the encoding of the element to be searched. If the encoding is greater than the encoding of intset, the element definitely does not exist. Otherwise, call the intsetSearch function to further find the element
  • The intsetSearch function first checks if there is an element in intset, and returns 0 if there is none. If there are elements, then determine whether the element is between the maximum and minimum of intSet, return 0, otherwise continue to search
  • Binary search for elements returns 1 if found, 0 if not found
// intset.c
uint8_t intsetFind(intset *is, int64_t value) {
    // Get the encoding of the element to be found
    uint8_t valenc = _intsetValueEncoding(value);
    // If the encoding of the element to be searched is greater than the encoding of intSet, 0 is returned
    // Otherwise find the element further
    return valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,NULL);
}
Copy the code
// intset.c
// is: the set of integers to find
// value: the element to be found
// pos: stores the position of the element when it is inserted or deleted
static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) {
    int min = 0, max = intrev32ifbe(is->length)- 1, mid = - 1;
    int64_t cur = - 1;

    if (intrev32ifbe(is->length) == 0) {
        // If there is no element in intSet, return 0
        if (pos) *pos = 0;
        return 0;
    } else {
        Return 0 if the element to be found is greater than or less than the largest element
        if (value > _intsetGet(is,max)) {
            if (pos) *pos = intrev32ifbe(is->length);
            return 0;
        } else if (value < _intsetGet(is,0)) {
            if (pos) *pos = 0;
            return 0; }}// Binary search element
    while(max >= min) {
        mid = ((unsigned int)min + (unsigned int)max) >> 1;
        cur = _intsetGet(is,mid);
        if (value > cur) {
            min = mid+1;
        } else if (value < cur) {
            max = mid- 1;
        } else {
            break; }}if (value == cur) {
        // Find the element, return 1
        if (pos) *pos = mid;
        return 1;
    } else {
        // No element found, returns 0
        if (pos) *pos = min;
        return 0; }}Copy the code

3.2 Inserting Elements

Steps for inserting elements:

  • Check whether the encoding of the element to be inserted is greater than that of intSet. If so, call intsetUpgradeAndAdd to upgrade and then add the element; otherwise, continue to add the element

  • Call intsetSearch to find the insertion location of the element to be inserted. Exit intsetAdd if the element already exists, otherwise it gets to pos where it was added and continues adding

  • Call intsetResize to expand intSet and allocate memory space for the element to be added

  • If the insertion position is not at the end of the array, all elements larger than the element to be added need to be moved back one place to make room for the element to be inserted

  • Save the element to be inserted into the insertion position, and increase the number of elements in intSet by one

// intset.c
intset *intsetAdd(intset *is, int64_t value, uint8_t *success) {
    // Get the encoding of the element to be inserted
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 1;

    if (valenc > intrev32ifbe(is->encoding)) {
        // If the encoding of the element to be inserted is greater than that of intSet, an upgrade is required
        // Call intsetUpgradeAndAdd to upgrade intSet
        return intsetUpgradeAndAdd(is,value);
    } else {
        // If the element can be found in intSet, it is returned directly
        // Otherwise get the insertion position pos
        if (intsetSearch(is,value,&pos)) {
            if (success) *success = 0;
            return is;
        }

        // Expand the memory space
        is = intsetResize(is,intrev32ifbe(is->length)+1);
        // If the insertion position is not at the end of the array
        // Move all elements in the array larger than the element to be added back one place to make room for the element to be inserted
        if (pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1);
    }

    // Save elements to be added
    _intsetSet(is,pos,value);
    // The number of elements in intset is increased by 1
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}
Copy the code

Steps to add elements after the upgrade:

  • Prepend identifies whether the element to be inserted is positive or negative

  • Change intset encoding, expand memory space

  • Adjust the memory space for each element from the last element forward. Starting with the last element prevents unprocessed data from being overwritten

  • To the head of the array if the element to be inserted is negative, to the end of the array if the element to be inserted is positive

  • The number of elements in intset is increased by one

// intset.c
static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {
    uint8_t curenc = intrev32ifbe(is->encoding);
    uint8_t newenc = _intsetValueEncoding(value);
    int length = intrev32ifbe(is->length);
    // prepend identifies whether the element to be inserted is positive or negative
    int prepend = value < 0 ? 1 : 0;

    // Change the encoding of intSet
    is->encoding = intrev32ifbe(newenc);
    // Expand the memory space
    is = intsetResize(is,intrev32ifbe(is->length)+1);

    // Move forward from the last element and adjust the size of each element
    while(length--)
        _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc));

    if (prepend)
        // The element to be inserted is negative and stored at the head of the array
        _intsetSet(is,0,value);
    else
        // The element to be inserted is positive and stored at the end of the array
        _intsetSet(is,intrev32ifbe(is->length),value);
    // The number of elements in intset is increased by 1
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;
}
Copy the code

3.3 Deleting Elements

Steps for deleting elements:

  • Check whether the encoding of the element to be deleted is less than or equal to that of intSet. If the encoding does not indicate that the element to be deleted is not in intSet, return directly
  • Call intsetSearch to check whether the element to be deleted exists. If it does not, it will return directly. If it does, it will get pos
  • If the deleted location is not at the end of the array, all the elements after the deleted location will be moved forward by one place to directly overwrite the element to be deleted. Otherwise, intSet will directly discard the element to be deleted after shrinking the memory space
// intset.c
intset *intsetRemove(intset *is, int64_t value, int *success) {
    uint8_t valenc = _intsetValueEncoding(value);
    uint32_t pos;
    if (success) *success = 0;

    // If the encoding of the element to be deleted is less than or equal to the encoding of intSet, and the deletion position can be obtained
    // Further deletes the element
    if (valenc <= intrev32ifbe(is->encoding) && intsetSearch(is,value,&pos)) {
        uint32_t len = intrev32ifbe(is->length);

        if (success) *success = 1;

        // If the deleted position is not at the end of the array, all the elements after the deleted position will be moved forward one place, directly overwriting the element to be deleted
        // If the delete location is at the end of the array, intSet shrinks the memory space and discards the element directly
        if (pos < (len- 1)) intsetMoveTail(is,pos+1,pos);
        is = intsetResize(is,len- 1);
        // The number of elements in intset is reduced by 1
        is->length = intrev32ifbe(len- 1);
    }
    return is;
}
Copy the code

4. Commonly used API

// Initialize intSet
intset *intsetNew(void);
// Insert elements
intset *intsetAdd(intset *is, int64_t value, uint8_t *success);
// Delete elements
intset *intsetRemove(intset *is, int64_t value, int *success);
// Find if the element exists
uint8_t intsetFind(intset *is, int64_t value);
// Returns a random element
int64_t intsetRandom(intset *is);
// Gets the element at the specified position
uint8_t intsetGet(intset *is, uint32_t pos, int64_t *value);
// Get the number of intSet elements
uint32_t intsetLen(const intset *is);
// Get the total number of bytes occupied by intSet
size_t intsetBlobLen(intset *is);
Copy the code

Redis 5 Design and source code Analysis