Redis source code analysis series of articles
Why analyze from Redis source code
String low-level implementation – dynamic String SDS
Redis bidirectional linked list article all know
Interviewer: Tell me about Redis Hash base:…… (Interview questions from reading)
Skip list is sure not to understand 😏
preface
Da Ga Hao, today is still a full day, put aside the demands that will never be finished, refuse to ask rogue abnormal customers, simply learn technology, feel the charm of technology. (Ha, ha, ha, ha, ha)
In the past few weeks we have looked at the underlying data structure of Redis, such as dynamic string SDS, bidirectional linked list Adlist, Dict, skip list, if you do not understand the common types or underlying data structure of Redis please see the above portal.
Today, let’s go to the bottom of the set implementation integer set, if there is a set do not understand, common API use this article will not talk, look at the above portal ha.
Concept of integer set
The set of integers is an underlying structure designed by Redis and is the underlying implementation of set. This structure is used when the set contains only integer value elements and the set element has a small amount of data. But if you don’t, you’re going to use something else, so I’m not going to talk about that.
The following figure shows the actual composition of the integer set, which consists of three parts: encoding, length, and contents. (Just take a quick look here, and the details for each module are 😝)
The implementation of integer sets
Int set.h intset.h intset.h intset.h intset.h
// uint32_t encoding; // The encoding format is as follows. The default value is INTSET_ENC_INT16 uint32_t length; Int8_t contents[]; The element type is not necessarily ini8_T, the flexible array does not take up the size of the intSet structure, and the elements in the array are arranged from smallest to largest. } intset;#define INTSET_ENC_INT16 (sizeof(int16_t)) //16 bits, 2 bytes, indicating the range from -32,768 to 32,767
#define INTSET_ENC_INT32 (sizeof(int32_t)) //32 bits,4 bytes, indicating the range from 2,147,483,648 to 2,147,483,647
# define INTSET_ENC_INT64 (sizeof (int64_t)) / / 64, 8 bytes, said - the range of 9223372036854775808-9223372036854775807Copy the code
Encoding format
There are three types: INTSET_ENC_INT16, INTSET_ENC_INT32, INTSET_ENC_INT64, which correspond to different ranges respectively. See the comment information of the above code.
In order to save as much memory as possible (it’s money, after all, use 😭), we need to use different types to store data.
Number of set elements length
It keeps track of the length of the contents of the data, how many elements there are.
Holds the element’s array contents
Where the data is actually stored, the array is sorted from smallest to largest and contains no duplicates (because a set is duplicate-free, its underlying implementation is also inclusive).
Integer collection upgrade process (emphasis, manual star labeling)
The encoding is INTSET_ENC_INT16. The encoding is 16 bits. The length is 4, so the content has four elements, 1, 2, 3, and 4. If we were to add a numeric bit 40000, it would obviously exceed the range of INTSET_ENC_INT16 -32,768 to 32,767, which would be INTSET_ENC_INT32. So how did he upgrade from INTSET_ENC_INT16 to INTSET_ENC_INT32?
1. Learn about old storage formats
Let’s first look at how the four elements 1, 2, 3, and 4 are stored. First of all, we need to know how many bits there are. The calculation rule is the number of bits in length* encoding format, that is, 4*16=64. So each element takes up 16 bits.
2. Determine the new encoding format
The new element is 40000, which exceeds the INTSET_ENC_INT16 range of 32,768 to 32,767, so the new encoding is INTSET_ENC_INT32.
3. Add memory according to the new encoding format
As stated above, the encoding format is INTSET_ENC_INT32, and the calculation rule is the number of bits in length* encoding format, that is, 5*32=160. So the new bits are 64 to 159.
4. Set the value based on the encoding format
According to the new encoding format, each data should occupy 32 bits, but the old encoding format, each data occupy 16 bits. So we start at the back and get 32 bits at a time to store the data.
Are you kidding me? ☺.
First, the last 32 bits, 128-159, store 40,000. So 49-127 is empty.
Next, take the last 32 bits of the empty 49-127, the 32 bits from 96 to 127, and store the 4. thenPrevious 4 storage locations 48-63
and49-127 and then 64-95
These two parts make up a large part, i.e48-95.
It’s free now.
And then in the 48-95 majority, the last 32 bits, 64-95, are used to store the 3. So the previous 3 storage locations 32-47 and 48-95 and the remaining 48-63 make up a large part, 32-63, which is now empty.
And then I’m going to take most of the 32-63, and I’m going to take the last 32 bits, which is 32-63 again, and I’m going to store 2. So the previous two storage locations 16-31 are empty.
Finally, combine 16-31 with the original 0-31 and store 1.
At this point, the upgrade process is complete. Overall, it is divided into three steps: determine the new encoding format, add the required memory space, and adjust the data from back to front.
Here’s a little bit of a question, why do I adjust the data from back to front?
The reason is that if you go from front to back, the data might overwrite. For example, data 1 is in bits 0 to 15, and data 2 is in bits 16 to 31. If from front to back, we know that the new encoding INTSET_ENC_INT32 requires each element to occupy 32 bits, then data 1 should occupy bits 0 to 31, at which point data 2 will be overwritten, and data 2 will not be known from now on.
But from back to front, there is no overwriting because there is some new memory in the back.
Advantages of the upgrade
To save memory
Integer collections save memory by allowing the collection to hold three different types of values while ensuring that upgrades are only done when needed.
Demotion not supported
Once the array is upgraded, the code keeps the upgraded state forever. Even if 40000 is deleted, the encoding format will not be INTSET_ENC_INT16.
Integer set source analysis
Create an empty collection intSetNew
This method is relatively simple and is a step to initialize a collection of integers, as shown in the figure below.
The main steps are allocating memory, setting the default encoding format, and initializing the array length.
intset *intsetNew(void) { intset *is = zmalloc(sizeof(intset)); Is ->encoding = intrev32ifbe(INTSET_ENC_INT16); // Set the default encoding format INTSET_ENC_INT16 is->length = 0; // Initialize lengthreturn is;
}
Copy the code
Add elements and upgrade the insetAdd flowchart (key)
Add elements and upgrade insetAdd source analysis
Can be based on the above flow chart, with the following source analysis, here will not write ha.
// Add element // Input parameter *is is the original integer set //value is the element to be added //*success is the indicator of whether the element was added successfully. 1 indicates that the element was added successfully. Intset *intsetAdd(intset *is, int64_t value, Uint8_t valenc = _intsetValueEncoding(value); uint32_t pos; // If SUCCESS has no initial value, it is initialized to 1if(success) *success = 1; // If the new encoding format is larger than the current encoding format, upgrade and add elementsif(Valenc > intrev32ifbe(is->encoding)) {// Call another methodreturn intsetUpgradeAndAdd(is,value);
} else{// If the encoding format does not change, call the query method // input parameter is the original integer set //value is the data to be added //pos is the locationif(intsetSearch(is,value,&pos)) {// If found, return directly, because the data is not repeatable.if (success) *success = 0;
returnis; } // Set length is = intsetResize(is,intrev32ifbe(is->length)+1);if(pos < intrev32ifbe(is->length)) intsetMoveTail(is,pos,pos+1); } // Set data _intsetSet(is,pos,value); is->length = intrev32ifbe(intrev32ifbe(is->length)+1);returnis; } / /#define INT8_MAX 127
//#define INT16_MAX 32767
//#define INT32_MAX 2147483647
//#define INT64_MAX 9223372036854775807LL
static uint8_t _intsetValueEncoding(int64_t v) {
if (v < INT32_MIN || v > INT32_MAX)
return INTSET_ENC_INT64;
else if (v < INT16_MIN || v > INT16_MAX)
return INTSET_ENC_INT32;
else
returnINTSET_ENC_INT16; } // According to the encoding format of the input parameter value, Static intSet *intsetUpgradeAndAdd(intSet *is, Uint8_t curenc = intrev32ifbe(is->encoding); Uint8_t newenc = _intsetValueEncoding(value); Int length = intrev32ifbe(is->length); Int prepend = value < 0? 1:0; Is ->encoding = intrev32ifBe (newenc); is = intsetResize(is,intrev32ifbe(is->length)+1); // Cycle gradually until length is less than 0, resetting each value one by one, from back to frontwhile(length--) _intsetSet(is,length+prepend,_intsetGetEncoded(is,length,curenc)); // If value is negative, place it firstif (prepend)
_intsetSet(is,0,value);
else// If value is an integer, set the last element to value _intsetSet(is,intrev32ifbe(is->length),value); // Reset length is->length = intrev32ifbe(is->length)+1);returnis; } // Find the subscript value in the is set, return 1, save in pos, return 0 if not found, Static uint8_t intsetSearch(intset *is, int64_t value, uint32_t *pos) {int min = 0, max = intrev32ifbe(is->length)-1, mid = -1; int64_t cur = -1; // If the set is empty, the position pos is 0if (intrev32ifbe(is->length) == 0) {
if (pos) *pos = 0;
return 0;
} else{// Because the data is an ordered set, if the data to be added is greater than the last number, just put the value to be added at the end, return the maximum indexif (value > _intsetGet(is,intrev32ifbe(is->length)-1)) {
if (pos) *pos = intrev32ifbe(is->length);
return 0;
} else if(value < _intsetGet(is,0)) {value < _intsetGet(is,0)) {value < _intsetGet(is,0)if (pos) *pos = 0;
return0; }} // Ordered sets use dichotomywhile(max >= min) {
mid = ((unsigned int)min + (unsigned int)max) >> 1;
cur = _intsetGet(is,mid);
if (value > cur) {
min = mid+1;
} else if (value < cur) {
max = mid-1;
} else {
break; }} // Make sure you find itif (value == cur) {
if(pos) *pos = mid; // set pos, return 1, find the positionreturn 1;
} else{// If not found, min and Max are adjacent, whatever the setting, and return 0if (pos) *pos = min;
return0; }}Copy the code
conclusion
This article is mainly about the SET of Redis data types of the underlying implementation integer collection, from what is integer SET first, and analyses its main constituent, then through the multiple image process diagram explains how intset upgrade, the final description of integer SET is combined with the source code, such as the creation process, the upgrade process, interspersed with examples and the process diagram.
If you feel that writing is also good, trouble to give a praise 👍, your recognition is the power of my writing!
If you think something is wrong, please feel free to comment.
All right, bye.
Thank you all ❤
1. If this article is helpful to you, please support it with a like. Your like is the motivation for my writing.
2. Follow the public account “learning Java little sister” to add my friends, I pull you into the “Java technology exchange group”, we communicate and progress together.