What are the implementation and expansion principles of HashMap?

This article has participated in the third “Topic Writing” track of the Nuggets Creators Camp. Check out the details:The third phase of the Diggability Program | Creators’ camp is underway to “write” about your personal impact.

📖 preface

Holding fireworks for a living and poetry for love

I also don’t know what makes me have the courage to participate in the topic writing, ability is not high, welcome to give advice, do not ridicule thank you!

What is 🚀`HashMap`

HashMapIt’s based on a hash tableThe Map interfaceAsynchronous implementation of (Hashtable 跟 HashMapIt’s very similar. The only difference isHashtalbeIs thread-safe, i.e., synchronous). This implementation provides all the optional mapping operations and allows the use ofnullValues andnullThe key. This class does not guarantee the order of the mappings, especially it does not guarantee that the order is constant.

Data structure:

🐱🏍 look at a few questions

The problem	conclusion
`HashMap`Whether empty is allowed	`Key` 和 `Value`Are allowed to be null
`HashMap`Whether to allow duplicate data	`Key`Repetition overrides,`Value`Allowed to repeat
`HashMap`Whether to order	Disorder, in particular, refers to traversal`HashMap`It’s almost impossible to get the order of the elements`put`The order of
`HashMap`Thread Safe or not	Non-thread safe

🤳`HashMap`The data structure of:

injavaIn a programming language, there are two basic structures, one is an array, the other is a mock pointer (reference), and all data structures can be constructed from these two basic structures.HashMapNo exception.HashMapIt is actually an “array of lists” data structure, with each element holding an array of list head nodes, that is, a combination of the array and the list.

HashMapThe bottom layer is an array structure, and each item in the array is a linked list. When you create a new oneHashMapIs initialized to an array. The source code is as follows:

/** * The table, resized as necessary. Length MUST Always be a power of two. */
transient Entry[] table;

static class Entry<K.V> implements Map.Entry<K.V> {
    final K key;
    V value;
    Entry<K,V> next;
    final inthash; ... }Copy the code

As you can see,EntryIt’s the elements in the array, each of themMap.EntryIt’s actually akey-valueYeah, it holds a reference to the next element, and that’s what makes a linked list.

HashMapThe bottom layer is mainly based on arrays and lists to achieve, it has a fairly fast query speed mainly because it is calculated by the hash code to determine the location of storage.HashMapMainly throughkey 的 hashCodeTo calculate thehashIt’s worth it, as long ashashCodeSame thing, calculatedhashIt’s the same value. If the stored object pairs are many, it is possible to calculate different objectshashThe values are the same, and that’s what’s calledhashConflict.HashMapThe bottom layer is solved by a linked listhashThe conflict. Here please search baidu!

🎶`HashMap`The constructor of

HashMapThree constructors are provided:

HashMap(): Constructs a defaultInitial capacity (16) 和 Default load factor (0.75)The emptyHashMap.
HashMap(int initialCapacity): Construct a beltSpecify initial capacity 和 Default load factor (0.75)The emptyHashMap.
HashMap(int initialCapacity, float loadFactor): Construct aAn empty HashMap with the specified initial capacity and load factor.

Initial capacity and load factor. The capacity is the number of buckets in the hash table, the initial capacity is the capacity at which the hash table was created, and the load factor is a measure of how full a hash table can be before its capacity automatically increases. It measures how much space is used in a hash table. The larger the load factor is, the higher the hashtable loading degree is, and the smaller the hashtable loading degree is. For the hash table using the linked list method, the average time to find an element is O(1+ A). If the load factor is larger, the space is more fully utilized, but the result is that the search efficiency is reduced. If the load factor is too small, the hash table data will be too sparse and will be a serious waste of space. The default load factor is 0.75.

Usually we useHashMap()It is enough

👏`HashMap`And so on

✨ storage

public V put(K key, V value) {
    // When the key is null, call putForNullKey, and save null and the first position of the table. This is why HashMap allows null
    if (key == null)
        return putForNullKey(value);
    // Compute the hash value of the key
    inthash = hash(key.hashCode()); -- -- -- -- -- - (1)
    // Compute the position of the key hash value in the table array
    inti = indexFor(hash, table.length); -- -- -- -- -- - (2)
    // Iterate over e from I to find where the key is saved
    for(Entry<K, V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        // Determine whether the same key value exists across the chain
        // If they are identical, overwrite value and return the old value
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;    Old value = new value
              e.value = value;
            e.recordAccess(this);
            return oldValue;     // Return the old value}}// Increase the number of changes by 1
    modCount++;
    // Add key and value to I
     addEntry(hash, key, value, i);
    return null;
}
Copy the code

The process of saving data in HashMap is as follows: First, check whether the key is NULL. If the key is null, call putForNullKey directly and place the value in the first position of the array. If it is not empty, the hash value of the key is recalculated based on the hashCode of the key, and the hash value is calculated based on the position of the element in the table array (i.e., the subscript). If there are other elements in the table array at that position, the same key is compared. If it exists, it overwrites the value of the original key, otherwise it is stored at the head of the chain (the first saved element goes at the end). If the table does not have an element at this point, the element is directly placed at that position in the array. Take a look at the following:

Let’s start with iteration. The purpose of the iteration here is to prevent the existence of the same key value. If two keys are found to be the same, HashMap handles it by replacing the old value with the new value. There is no key processing here, which explains why there are no two identical keys in a HashMap. In addition, it is important to note that comparing the keys to the same HashCode is the same before determining whether the equals is true. (Check out the blogger history article for this — wait for me to post this in the post). This greatly increases the efficiency of HashMap.
At (1) and (2). This is the essence of HashMap. The first is the hash method, which is a purely mathematical calculation that computes the hash value of H. In this algorithm, high-order computation is added to prevent hash conflict when the low order remains the same while the high order changes. (I don’t know I don’t dare to say anything lol)
HashMap is implemented based on array + linked lists and red-black trees, but the length of the bucket used to hold the key value is fixed and determined by initialization. As the number of data inserts increases and the load factor increases, capacity expansion is required to store more data. One important aspect of scaling is that the optimization in JDK1.8 eliminates the need to recalculate the hash value of each element.

static int hash(int h) {
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

We know that forHashMap 的 tableFor example, the data needs to be evenly distributed (preferably with only one element for each item so that it can be found directly), not too tight, which leads to slow queries, and not too loose, which wastes space. To calculatehashAfter the value, how can I guarantee ittableWhat is the distribution of elements? We would think of taking modulus, but because of the high cost of taking modulus,HashMapHere’s how it works: callindexForMethods.

static int indexFor(int h, int length) {
    return h & (length-1);
}
Copy the code

🎎 The creation of chains

It’s a very elegant design. The system will always be newEntryObject tobucketIndexPlace. ifbucketIndexIf you already have an object atEntryObject will point to the originalEntryObject, forming a lineEntryStudent: Chain, but ifbucketIndexWhere noEntryObject, that ise==null, then the new additionEntryObject tonull, it will not be producedEntryThe chain.

🎊 Expansion Problems

As theHashMapAs the number of elements in the list increases, the probability of collision will increase, and the length of the linked list will become longer and longer, which is bound to affectHashMapIn order to ensure thatHashMapThe system must be scaled up at some critical point. The critical point is whenHashMapThe number of elements in PI is equal to PITable array length \* Load factor. But capacity expansion is a very time-consuming process because it requires recalculating these data in the newTable arrayAnd copy processing. So if we already knowHashMap, the preset number of elements can be effectively increasedHashMapPerformance.

According to the aboveputMethod source code can be seen when the program tries to put akey-valueFor in theHashMapWhen, the program first according to thekey 的 hashCode()The return value determines whichEntryStorage location: if twoEntry 的 key 的 hashCode()If they return the same value, they are stored in the same location. If these twoEntry 的 keythroughequalsMore returntrueNew add,Entry 的 valueWill overwrite the original setEntry 的 value, butkeyIt doesn’t overwrite. If these twoEntry 的 keythroughequalsMore returnfalse, newly addedEntryWill be with the original setEntryThe formation ofEntryChain, and a new additionEntryLocated in theEntryThe head of the chain.

Source code :(comments are written directly in the code)

final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    // Cap is short for capacity. If it's not null, it's already initialized
    if (oldCap > 0) {
        // If the capacity reaches the maximum of 1<< 30, stop expanding the capacity
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        
        // Calculate the new capacity and threshold at twice the old capacity and threshold
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
    
        // Initial capacity was placed in threshold;
        // When initialized, assign the value of threshold to newCap,
        // HashMap uses the threshold variable to temporarily store the value of the initialCapacity parameter
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // This part is also commented in English in the source code
        // When the no-parameter constructor is called, the bucket array capacity is the default capacity 1 << 4; aka 16
        / / threshold; Is the default capacity times the load factor, 0.75
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    
    // If newThr is 0, the capacity is calculated using the threshold formula
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    // Initialize the bucket to hold the key
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        // If oldCap has a value in the old bucket, the traversal maps the keys to the new bucket
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    // Split is a red black tree split operation. Operated during remapping.
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    // Here is the linked list
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

The new one is calculated during expansionNewCap, newThrThis is an abbreviation of two words, one isCapacityCapacity, and the other one is the thresholdThreshold
newCapArray buckets for innovationnew Node[newCap];
With the expansion, the elements that were previously stored as linked lists and red-black trees due to hash collisions need to be split and stored in new locations.

📖 read

public V get(Object key) {
    // If null, call getForNullKey to return the corresponding value
    if (key == null)
        return getForNullKey();
    // Calculate the hash code of the key based on its hashCode value
    int hash = hash(key.hashCode());
    // Retrieve the value at the specified index in the table array
    for(Entry<K, V> e = table[indexFor(hash, table.length)]; e ! =null; e = e.next) {
        Object k;
        // If the searched key is the same as the searched key, the corresponding value is returned
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}
Copy the code

When you get an element from a HashMap, you first calculate the hashCode of the key to find an element in the array, and then use the equals method of key to find the element in the list.

whenHashMapEach of theThe bucket (barrel)Stored in theEntryOnly a singleEntry— that is, without a pointerEntryChain, at this pointHashMapHas the best performance: when the program passeskeyTake out the correspondingvalue, the system only needs to first calculate thekey 的 hashCode()Returns the value in accordance with thehashCodeReturn the value to find thekey 在 tableIndex in the array, and then fetch the indexEntryAnd finally return thekeyThe correspondingvalueCan.

ifHashMapEach of thebucketThere’s only oneEntryWhen,HashMapCan be quickly retrieved according to the indexbucketIn theEntry; In the case of a “Hash conflict,” a singlebucketIt doesn’t store oneEntry, but aEntryChains, the system must only traverse each in orderEntryUntil I find what I want to search forEntrySo far — if I happen to be searchingEntryLocated in theEntryThe end of the chain (theEntryIs the first to put thebucketThe system must loop to the end to find the element.

HashMapAt the bottom will bekey-valueTreat it as a whole, and the whole is aEntryObject.HashMapThe bottom layer uses aEntry[]Array to hold all of themkey-valueYeah, when you need to store oneEntryObject, according tohashAlgorithm to determine its storage location in the array, based onequalsMethod determines where it is stored in the list at the array location; When you need to take one outEntryWill also be based onhashThe algorithm finds where it’s stored in the array, and then it’s based onequalsMethod is used to retrieve theEntry.

HashMapIn theKey 的 HashCodeTo do one more timerehashTo prevent some badThe Hash algorithmGenerated badlyHashCodeSo why prevent the bad onesHashCode?
- Bad HashCode means Hash conflict, that is, multiple different keys may result in the same HashCode. Bad Hash algorithm means the probability of Hash conflict is increased, which means that the performance of HashMap will degrade in two aspects:
- (1) There are 10KeyMaybe sixKey 的 HashCodeAll the same, the other fourKeyWhere theEntries are evenly distributed intableThe position of, and a certain location is connectedSix Entry. That’s lostHashMapThe significance ofHashMapThe premise of this high performance data structure is,The entries are evenly distributed intablepositionBut now it isOne, one, one, one, sixThe distribution of. So,We requireHashCodeThere’s a lot of randomnessIn this way we can guarantee as much as possibleEntryThe randomness of the distribution is improvedHashMapThe efficiency.
- (2),HashMapIn a certainThe table positionCode for traversing a linked list:if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

🎉 finally

For more references, see here:Chen Yongjia’s blog
Like the little friend of the blogger can add a concern, point a like oh, continue to update hey hey!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

What are the implementation and expansion principles of HashMap? | topic writing

📖 preface

What is 🚀`HashMap`

🐱🏍 look at a few questions

🤳`HashMap`The data structure of:

🎶`HashMap`The constructor of

👏`HashMap`And so on

✨ storage

🎎 The creation of chains

🎊 Expansion Problems

📖 read

🎉 finally

What are the implementation and expansion principles of HashMap? | topic writing

📖 preface

What is 🚀HashMap

🐱🏍 look at a few questions

🤳HashMapThe data structure of:

🎶HashMapThe constructor of

👏HashMapAnd so on

✨ storage

🎎 The creation of chains

🎊 Expansion Problems

📖 read

🎉 finally

Related Posts

Java reads Blob files from the database to access the file upload interface

10 Spring cloud-gateway Gateway 3

Summary of Microservices learning -> RPC/Netty/Dubbo

What is 🚀`HashMap`

🤳`HashMap`The data structure of:

🎶`HashMap`The constructor of

👏`HashMap`And so on