preface
HashMap HashMap HashMap HashMap HashMap HashMap HashMap To be honest, I have seen the design and implementation of Redis (sincerely recommend), but I can’t describe it accurately, so I write this article.
Note: Since this article focuses on the hash process, the source code section is easy to read
Hash function (hash algorithm)
-
A hash function, also known as a hash algorithm or hash function, is a method of creating small digital “fingerprints” from any kind of data, and this “fingerprint” is the hash value.
-
Hash functions can be used in a variety of fields, such as protecting data, ensuring that real information is delivered, and hash tables. This article focuses on the application of hash tables.
-
Of course, we want hash functions to ensure that each key corresponds to a “fingerprint”, that is, a hash value, the so-called perfect hash, but due to performance, application scenarios and other considerations, can accept not too many hash collisions.
-
Hash collision: There is not A unique relationship between the input and output of the hash function. For example, the hash value of the input A and B of the hash function is C.
-
Common hash functions:
Direct addressing method number analysis method square take center method fold method divide remainder method random number method
- We all know that HashMap and Redis dictionary scenarios are optimized based on the outleave remainder method as hash functions. I don’t want to belabor the concept of each method here, but I want you to look at it here.
Hash collision (hash collision) resolution
Hash? -> hash algorithm selection -> hash conflict how to resolve
Surely this is the mindset of most peers. So, what are the main algorithms for resolving hash conflicts?
Open addressing
- Once a conflict occurs, go to the next empty hash address, the simplest algorithm company is as follows.For example, SettingsIf (26+1) %12 = (37+1) %12, then (37+2) %12 = (37+1) %12, then (37+2) %12 = 4, then (37+2) %12 = 4.
Hashing again
- Prepare multiple hash functions at the same time so that in case the first hash function conflicts, the alternate hash function can be used to calculate.
Chain address method
- The hash value is first obtained by the division and remainder method. If the hash value conflicts, the nodes in the conflict are inserted one by one through the linked list to form the structure as shown in the following figure. The following example is,You can see that 48%12 = 12%12 = 0, forming a linked list.
Public overflow zone method
- Use additional common storage to store elements with conflicting hash values.
HashMap
Hash algorithm of HashMap
episodeLOAD_FACTOR
(Load factor)
-
The default LOAD_FACTOR for HashMap is 0.75. What does it do? Follow the source below trace ~
-
New HashMap (capacity = 16, load_factor = 0.75);
/** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). * /
public HashMap(a) {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
Copy the code
- As the
put
Element, must be expanded. Lookresize()
Function, refers to post the iconic part.
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
Copy the code
- As you can see, the threshold is passed
capacity
Multiplied by theload_factor
The result is, namely16 * 0.75 = 12
.HashMap
If the number of elements in theIf the threshold is small relative to the capacity, the probability of a hash collision is reduced, because the probability of a hash collision with 16 elements is smaller than the probability of a hash with 12 elements given the same hash function.
Hash function formula and analysis
- The hash algorithm of HashMap approximates the division and remainder method, but it is not useful
MOD
I’m going to do it in bits, let’s sayIs the input value and the hash function isThe specific formula is as follows:
f(key) = hash(key) & (table.length - 1)
hash(key) = (h = key.hashCode()) ^ (h >>> 16)
Copy the code
hash(key) & (table.length - 1)
Is thetable.length
The optimized version of the mod, which actually does the same thing, is based on the division remainder method. Due to theHashMap
The feature is expanded each timetable.length
It would be, so bit operations are significantly more efficient.>>>
Is the unsigned right shift operator. We know thathashCode()
It’s a very wide range of values, so it’s very unlikely to conflict with each other, but it’s uptable.length - 1
The odds are going to go up becausetable.length
Is a smaller value. That’s why it works, right> > > 16
The reason,hashCode()
The high and the low are both rightf(key)
With a certain amount of influence, it makes the distribution more uniform, so there’s less chance of hash collisions.
HashMap hash conflict resolved
HashMap
The hashing conflict resolution method is obviousChain address methodIn fact, you can see it in the structure.- Available from the
HashMap
theresize()
Method can also be seen, see part of the source code… Comments on.
if(oldTab ! =null) {
// Iterate over the nodes on the old array
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// The current list has only one node (no hash collisions)
if (e.next == null)
// The hash algorithm is used to calculate the storage location and place
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
// The red black tree went to...
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// Low order list, high order list
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
// Iterate over the list with hash conflicts
next = e.next;
// Hash values less than the old array capacity are put into the low list
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// Hash values greater than or equal to the size of the old array are put into the high-order list
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// Place the list under the original index
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// Place the list in the old index + old array size
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
Copy the code
Redis dictionary
Introduction to Data Structures
withHashMap
About the same place
First of all, I will briefly explain the data structure of Redis dictionary, which is the C language that produced shadows when I was in college.
typedef struct dictht {
// Hash table array
dictEntry **table;
// Hash table size
unsigned long size;
// Hash table size mask, used to calculate index values
// always equal to size-1
unsigned long sizemask;
// The number of nodes in the hash table
unsigned long used;
} dictht
Copy the code
- There’s an array of hash tables, doesn’t that look familiar, and
Node<K,V>[]
So let’s seedictEntry
This class.
typedef struct dictEntry {
/ / key
void *key
/ / value
union {
void *val;
unit64_tu64;
int64_ts64;
} v
/ / next pointer
struct dictEntry *next;
}
Copy the code
-
That looks familiar again, key-value pairs! The key property holds the key in the key-value pair, while the V property holds the value, which can be a pointer, unit64_t, or int64_t integer. Next is a pointer to another hash table node that joins the same key-value pairs of multiple hash tables into a linked list.
-
As you can see, the HashMap is basically the same so far, except for a few additional attributes (hash table size, number of nodes in use, mask, and so on).
The difference
- The bottom layer of the dictionary is implemented by the hash table structure, so what is the real face of the dictionary?
typedef struct dict {
// Type specific functions
dictType *type;
// Private data
void *privdata;
// hash table
dictht ht[2];
/ / rehash index
// When rehash is not in progress, the value is -1
int trehashidx;
}
Copy the code
- At this point, the data structure of Redis dictionary is described. The following figure shows the dictionary in normal state.
Redis dictionary hash algorithm
- To calculateHash valueThe process of calculating a hash for a dictionary-based hash function:
hash = dict -> type->hashFunction(key)
- Using the sizemask attribute of the hash table and the hash value, calculate the index value, depending on the case
ht[0]
orht[1]
. In fact, andHashMap
thehash(key) & (table.length - 1)
Same thing, because it says it in the notessizemask
Always equal to thesize - 1
.
index = hash & dict->ht[x].sizemask
- The hash calculation algorithm used by Redis is
MurmurHash2
I am not in a position to say,Give the connection.
Redis dictionary hash conflict resolved
- Redis hash tables are also usedChain address method, each node has one
next
Pointer, multiple nodes form a one-way linked list, andHashMap
The difference is because there is no table tail pointer to useThe first interpolationAdds a new node to the head position of the linked list.
Redis is different from HashMap
See hash algorithm, hash conflict resolution, there is not much difference, so what is the difference? That’s rehash.
Rehash
-
As mentioned earlier, there are two hash tables in the dictionary data structure (HT [2]), and that’s the secret.
-
The purpose of Rehash is to keep the load factor of a hash table within a reasonable range. A hash table needs to expand or contract when there are too many or too few key-value pairs.
-
The steps are as follows:
- For a dictionary
ht[1]
Hash table allocates space, if executedextensionOperation, thenht[1]
The magnitude of theta is greater than or equal to the first oneht[0].used *2
the; Student: If you doshrinkageOperation, thenht[1]
The first one is greater than or equal toht[0].used
the. - Will be stored in the
ht[0]
All key-value pairs inrehash
toht[1]
Above, the key hash and index values are recalculated, and the key-value pair is placed intoht[1]
Hash table at the specified location. - when
ht[0]
All of the included key-value pairs have been migrated toht[1]
After (ht[0]
Becomes an empty hash tableht[0]
That will beht[1]
Set toht[0]
And, inht[1]
Create a new blank hash table for next timerehash
To prepare for.
- For a dictionary
-
Note: Progressive Rehash is not covered in this paper, the author is really not motivated to draw, the data structure is complicated, please understand.
conclusion
This article from the hash algorithm, hash collision solution, a brief analysis of the HashMap, Redis dictionary hash, views are not necessarily correct, please comment!
reference
- zh.wikipedia.org/wiki/ hash function
- Redis design and implementation
- www.zhihu.com/question/26…