preface

This time, I will study HashMap with you. HashMap is often used in work and often asked in interviews, because it contains a lot of knowledge points and can be a good test of personal foundation. But why didn’t I learn something so important in the first place, because it consists of a variety of basic data structures and some code design ideas. We need to learn these basics and then learn HashMap so that we can understand it better. The ancients cloud: no desire for speed, no small profit. More haste, less speed.

A HashMap is basically an ArrayList and LinkedList data structure with the idea of hashCode and Equals methods. For those of you who don’t understand what I said above, open up my past articles.

HashMap (source analysis based on JDK8, assisted by JDK7), the content of the question is just a summary of the summary of HashMap, because there has been a HashMap easy to understand the analysis of a time, I learn HashMap is mainly through this article to learn, Recommended: Rediscovering HashMap in the Java 8 series by meituan-Dianping Technology team

This article release in Jane’s book: www.jianshu.com/p/32f67f9e7…

Q&a content

1.

Q: Has HashMap ever been useful? Can you tell me the main uses of it?

A:

  • Yes, I often use it in my daily workHashMapThis data structure,HashMapIs based onMapA key-value pair implemented by an interface<key,value>The storage structure allowsnullValue, while not ordered, not synchronous (i.e., thread unsafe).HashMapThe underlying implementation is array + linked list + red-black tree (red-black tree part added in JDK1.8). It stores and finds data by keykeythehashCodeThe value of calculates the specific storage location.HashMapA key that allows at most one recordkeyfornull.HashMapAdd, delete, change, check and other routine operations have a good execution efficiency, yesArrayListandLinkedListData structure, such as a compromise implementation.

Sample code:

// Create a HashMap, default bottom layer if no initial size is specifiedhashTable array size is 16 HashMap<String, String>hashMap = new HashMap<String, String>(); // Add elements to the containerhashMap.put("Xiao Ming"."Nice");
        hashMap.put("Wang"."Rip me off.");
        hashMap.put("Brother"."There's nothing wrong with it.");
        hashMap.put("Nuggets"."Good place");
        hashMap.put("Fifty"."Don't mess it up."); // Get the element whose key is xiaominghashMap.get("Xiao Ming"); // value: value system.out.println (element); // Remove the element whose key is kinghashMap.remove("Fifty"); // value: system.out.println (removeElement); // Change the value of the element whose key is xiaominghashMap.replace("Xiao Ming"."It's actually kind of ugly."); } system.out.println () {system.out.println () {system.out.println ();hashMap); // The put method can also be used to modify the value of the corresponding elementhashMap.put("Xiao Ming"."It's actually okay. I'm just kidding."); } system.out.println () {system.out.println () {system.out.println ();hashMap); Boolean isExist = // Check whether the element whose key is laowang existshashMap.containsKey("Wang");
        // trueSystem.out.println(isExist); // Check if there is value ="Rip me off."Boolean isHasSomeOne =hashMap.containsValue("Rip me off.");
        // trueSystem.out.println(isHasSomeOne); Value: 4 system.out.println (hashMap.size());Copy the code
  • HashMapThe underlying implementation of JDK1.8 is array + linked list + red-black tree. The core elements are:
  1. int size; Used to record the number of elements actually stored in the HashMap;

  2. float loadFactor; Load factor (the default is 0.75, which is explained later).

  3. int threshold; Resize (Threshold = container capacity x load factor). The capacity expansion mechanism is triggered when the threshold is reached. That is, the larger the load factor, the more key-value pairs the container can hold after the capacity is defined.

  4. Node

    [] table; The underlying array, which acts as a hash table, is used to store elements at the hash position Node

    . The array length is always 2 to the NTH power. (Specific reasons will be explained later)
    ,v>
    ,v>

Sample code:

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, The Serializable {... / * -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - Fields -- -- -- -- -- -- -- -- -- -- -- -- -- - * * * * / / hash table, initialized will be used for the first time, it is necessary to reset the size operation, * when distribution capacity, Length is always 2 to the NTH power. */ transient Node<K,V>[] table; /** * The number of key-value pairs stored */ TRANSIENT int size; /** * Next capacity expansion threshold * (threshold = container capacity * load factor). * @serial */ int threshold; /** * Load factor of hash table ** @serial */ finalfloatloadFactor; ...}Copy the code
  • Among themNode<K,V>[] table;The core element of hash table storage isNode<K,V>.Node<K,V>Contains:
  1. final int hash; The hash value of the element that determines the element stored in Node

    [] table; Position in the hash table. When the value of the hash is determined, it cannot be modified.
    ,v>

  2. final K key; The value of the key cannot be changed.

  3. V value; value

  4. Node

    next; Record an element node (single linked list structure for resolving hash collisions)
    ,v>

Sample code:

Static class Node<K,V> implements map. Entry<K,V> {final inthash; // The hash value of the element is final, whenhashThe final K key cannot be modified after the value of. V value cannot be changed after the final key is set. V value cannot be changed after the final key is set. / / value Node < K, V > next; // Record an element node (single linked list structure, used for solvinghash/** * Node constructor */ Node(int)hash, K key, V value, Node<K,V> next) {
            this.hash = hash; // Element hash this.key = key; // key this.value = value; // value this.next = next; } public final KgetKey()        { return key; }
        public final V getValue()      { return value; }
        public final String toString() { return key + "="+ value; } /** * is overridden for NodehashCode method with the value: keyhashCode xOR valuehashThe Code * operation is just going to add twohashIn the binary of Code, the same value at the same position is 0, and different values are 1. */ public final inthashCode() {
            returnObjects.hashCode(key) ^ Objects.hashCode(value); } /** * the value of an element */ public final VsetValue(V newValue) {
            V oldValue = value;
            value = newValue;
            returnoldValue; } /** * public final Boolean equals(Object o) {if (o == this)
                return true;
            if(o instanceof Map.Entry) { Map.Entry<? ,? > e = (Map.Entry<? ,? >)o;if (Objects.equals(key, e.getKey()) &&
                    Objects.equals(value, e.getValue()))
                    return true;
            }
            return false; }}Copy the code

HashMap memory structure – Image from “Meituan Dianping Technical team article”

2.

Q: Can you talk about the underlying implementation of common HashMap operations? For example, store PUT (K key, V value), search for GET (Object key), delete Remove (Object key), and change replace(K key, V value).

A:

  • callput(K key, V value)Operations addkey-valueThe following operations are performed on the key-value pair:
  1. Check whether hash table Node

    [] table is empty or null. If yes, perform the resize() method to expand the capacity.
    ,v>

  2. Based on the hash value of the inserted key, calculate the storage position of table[I] by (n-1) & hash the hash value of the current element & hash table length -1 (actually the hash value % hash table length). If there are no elements in the storage location, the new node is stored in this location table[I].

  3. If a key-value pair element exists at the storage location, check whether the hash value and key value of the element at the storage location are consistent with the current operation element. If they are, the operation is to modify the value and overwrite the value.

  4. If there are elements in the current storage location and they are inconsistent with the current operation element, it is proved that hash conflict has occurred in table[I] at this location. If the head node is a treeNode, it is proved that the structure of this location is a red-black tree, and nodes are added by red-black tree method.

  5. If it is not a red-black tree, it is proved to be a single linked list. Insert the new node to the last position of the linked list, and then judge whether the length of the current linked list is greater than or equal to 8. If it is, the linked list of the current storage location is transformed into a red-black tree. If the key already exists, the value is overwritten.

  6. If the number of storage key/value pairs exceeds threshold, expand the storage capacity.

Implementation flow chart of hashMap PUT method

Sample code:

/** * add key-value pair ** @param key * @param value * @returnNulll */ public V put(K key, V value) {// putVal is called to add a key-value pairreturn putVal(hash(key), key, value, false.true); } /** * based on the keyhashCode is generated by a "perturbation function"hashValue * After this operation, make each key corresponding tohashValues are generated more evenly, * reducing collisions between elements (more on that later) */ static final inthash(Object key) {
        int h;
        return(key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } /** * Add key-value to the actual call method of the key-value pair (emphasis) ** @paramhashThe key of keyhashValue * @param key * @param Value Value * @param onlyIfAbsent This valuetrue* @param evict If the value of this key already existsfalseThe hash table is in initialization mode * @returnReturn the old value, or null */ final V putVal(int)hash, K key, V value, Boolean onlyIfAbsent, Boolean evict) {// Used to record the currenthashTables Node < K, V > [] TAB. Node<K,V> p; // n For recordinghashIndex int n, I; / / the currenthashTable is emptyif((TAB = table) = = null | | (n = TAB. Length) = = 0) / / initializedhashTable and put the initialized afterhashN = (TAB = resize()).length; // 1) passes (n-1) &hashOf the current elementhashValue &hashTable length - 1 // 2) Determines the location of the current element. This operation is equivalent to that of the current elementhashThe value %hashTable length // 3) No element exists at the calculated storage locationif ((p = tab[i = (n - 1) & hashTAB [I] = newNode(TAB [I] = newNode());hash, key, value, null);
        else{// An element already exists in the current storage locationhash<K,V> e; // Temporarily exists a key value K K; // 1) If the current position already existshashValue and the new elementhashIf the value is equal // 2) and the key is equal, then it is the same key element and we want to modify the valueif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) e = p; // Assign the current node reference to eelse ifE = ((TreeNode<K,V>)p).puttreeval (this, TAB,hash, key, value);
            else{// If the above situation is excluded, then it has occurredhashConflict, andhashThe current structure of conflicting locations is single - linkedfor(int binCount = 0; ; ++binCount) {// Traverses the single linked list, placing the new element node at the end of the listif((e = p.ext) == null) {// Place the new element node at the end of the list.hash, key, value, null); // After new nodes are added, the number of current nodes is greater than or equal to the threshold 8if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for1st // If greater than or equal to 8, convert the list to a red-black tree, treeifyBin(tab, hash);
                        break; } // Override value if the corresponding key already exists in the listif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))break; p = e; }}if(e ! = null) {// The corresponding key oldValue = e.value;if(! OnlyIfAbsent | | oldValue = = null) / / if allowed to change, modify the value as the new value e.v alue = value; afterNodeAccess(e);returnoldValue; } } ++modCount; // If the number of storage key/value pairs exceeds the threshold, expand the storage capacityif(++size > threshold) // ResethashThe size will be oldhashThe data of the table is copied to the new one by onehashResize (); afterNodeInsertion(evict); // Return null, indicating a new operation, not a modification operationreturn null;
    }Copy the code
  • callget(Object key)Operation according to keykeyFind the correspondingkey-valueThe following operations are performed on the key-value pair:

1. Call the hash(key) method to calculate the hash value of the key

2. Based on the hash value of the key, calculate the storage location table[I] using (n-1) & hash hash value of the current element & Hash table length -1 (actually the hash value % hash table length) to check whether the storage location exists.

  • If there are elements stored in the storage location, the first comparison is made to the header element, if the header elementkeythehashValue and to getkeythehashValues are equal, and the values of the headerkeyItself and to getkeyEqual, return the header at that position.
  • Returns if there are no elements in the storage locationnull.

3. If there are elements stored in the storage location, but the header element is not the element to be searched, you need to traverse the location to search for the element.

4. Check whether the head node is a treeNode. If it is a treeNode, prove that the structure of this node is a red-black tree.

5. If it is not a red-black tree, it is proved to be a singly linked list. If the hash value of the key of the linked list node is equal to the hash value of the key to be obtained, and the key of the linked list node itself is equal to the key to be obtained, the node is returned. If no node corresponding to the key is found at the end of the traversal, null is returned.

Sample code:

/** * Returns the value mapped to the specified key * or null if there is no corresponding key ** in the container. More specifically, if the mapping contains a value that satisfies (key==null? K ==null :key.equals(k)) *; Otherwise null is returned. (There can be at most one such mapping.) * * Returning null does not necessarily mean that the mapping does not contain the mapping for the key; * It is also possible that the mapping explicitly maps the key to NULL. You can use the containsKey operation to distinguish between the two cases. * * @see#put(Object, Object)*/ public V get(Object key) { Node<K,V> e; // 1. Call firsthashThe (key) method calculates the keyhashValue // 2. Then call getNode to get the value mapped to the corresponding keyreturn (e = getNode(hash(key), key)) == null ? null : e.value; } /** * get hash table node method implementation ** @paramhashThe key of keyhashValue * @param key * @returnReturns the corresponding Node, or null */ final Node<K,V> getNode(int) if the Node does not existhash, Object key) {// Used to record the currenthashTables Node < K, V > [] TAB. // first is used to record correspondencehashNode<K,V> first, e; // n For recordinghashThe length of the table int n; // add Key K K; // Pass (n-1) &hashOf the current elementhashValue &hashTable Length -1 // Determine whether the current storage location of the element existsif((tab = table) ! = null && (n = tab.length) > 0 && (first = tab[(n - 1) &hash])! = null) {// if the element is present // if the key of the header ishashValue and the key to retrievehashValue equal // and the key of the header itself is equal to the key to be fetchedif (first.hash == hash&& / / always check first node (always check head nodes (k = first. Key) = = key | | (key! = null && key.equals(k)))) // returns the header at that locationreturn first;
            if((e = first.next) ! = null) {// The header is not equalif(first instanceof TreeNode) // if the current node is a TreeNode // prove that the current location of the linked list has become a red-black tree structure // obtain the corresponding key node through red-black TreeNodereturn ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do{// If the current position is not a red-black tree, it is proved to be a single-linked list. // If the current position is not a red-black tree, it is proved to be a single-linked listhashValue and the key to retrievehashValue equal // and the key of the linked list node itself is equal to the key to be fetchedif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
                } while((e = e.next) ! = null); }} // If none is found, null is returnedreturn null;
    }Copy the code
  • callremove(Object key)Operation according to keykeyDelete the correspondingkey-valueThe following operations are performed on the key-value pair:

1. Call the hash(key) method to calculate the hash value of the key

2. Based on the hash value of the key, calculate the storage location table[I] using (n-1) & hash hash value of the current element & Hash table length -1 (actually the hash value % hash table length) to check whether the storage location exists.

  • If there are elements stored in the storage location, the header element is first compared. If the hash value of the key of the header node is equal to the hash value of the key to be obtained, and the key itself is equal to the key to be obtained, the header node at that location is the node to be deleted, and this node is recorded in the variable node.

  • If there are no elements in the storage location, the node to be deleted is not found, and null is returned.

3. If there are elements stored in the storage location, but the header element is not the element to be deleted, you need to traverse the location to find the element.

4. Check whether the header node is a treeNode. If it is a treeNode, prove that the structure of this node is a red-black tree.

5. If it is not a red-black tree, it is proved to be a singly linked list. Iterate over the single linked list and compare the linked list nodes one by one. The hash value of the key of the linked list node is equal to the hash value of the key to be obtained, and the key itself of the linked list node is equal to the key to be obtained. Then, this node is the node to be deleted, and this node is recorded in the variable node. Returns NULL.

6. If the node to be deleted is found, determine whether the value of the node needs to be compared and whether the value of the node is consistent. If the value of the node is consistent or does not need to be compared, delete the node.

  • If the current node is a tree node, the linked list of the current position is proved to be a red-black tree structure, and the corresponding node is deleted by red-black tree node.

  • If it’s not a red-black tree, it’s a singly linked list. If the header is to be deleted, the current storage location of table[I] points to the next node to be deleted.

  • If the node to be deleted is not the head node, then the next node of the node to be deleted is assigned to the next field of the precursor node to be deleted, i.e. P.next = node.next; .

7. The HashMap currently stores the number of key-value pairs -1 and returns a delete node.

Sample code:

/** * Removes the mapping of the specified key from this mapping (if it exists). * * @param key The key whose mapping is to be removed from the mapping * @returnThe old value associated with key; If the key has no mapping, null is returned. * (Returning NULL may also indicate that null was associated with the key before the mapping.) */ public V remove(Object key) { Node<K,V> e; // 1. Call firsthashThe (key) method calculates the keyhashValue // 2. Then call removeNode to remove the node mapped to the corresponding keyreturn (e = removeNode(hash(key), key, null, false.true)) == null ? null : e.value; } /** * delete hash table node method ** @paramhashThe key ofhashValue * @param key * @param value value used for comparison when matchValue istrueOtherwise, * @param matchValue is ignoredtrueOnly remove * @param movable if value is equalfalseNo other nodes are removed when the operation is performedreturnNull */ final node <K,V> removeNode(inthash, Object Key, Object Value, Boolean matchValue, Boolean movable) {// Used to record the currenthashTables Node < K, V > [] TAB. Node<K,V> p; // n For recordinghashTable length, index Used to record the current operation index int n, index; // Pass (n-1) &hashOf the current elementhashValue &hashTable Length -1 // Determine whether the current storage location of the element existsif((tab = table) ! = null && (n = tab.length) > 0 && (p = tab[index = (n - 1) &hash])! Node<K,V> node = null, e; K k; V v; // If the key of the header ishashValue and the key to retrievehashIf the value is equal // and the header key itself is equal to the key // the header is the node to be deletedif (p.hash == hash&& ((k = p.key) == key || (key ! = null && key.equals(k)))) // record the reference address of the node to be deleted to node = p;else if((e = p.next) ! = null) {// The header is not equalif(p instanceof TreeNode)// if the current node is a TreeNode // prove that the current location of the linked list has become a red-black tree structure // obtain the corresponding key node through the red-black TreeNode // record the reference address of the node to be deleted in node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else{// If the current position is not a red-black tree, it is a singly linked listdo{// Walk through the single linked list, compare the key of the linked list node one by onehashValue and the key to retrievehashValue equal // and the key of the linked list node itself is equal to the key to be fetchedif (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k)))) {// find the reference address of the node to be deleted, interrupt the traversal node = e;break;
                        }
                        p = e;
                    } while((e = e.next) ! = null); }} // If the node to be deleted is found, determine whether the value of the comparison is consistentif(node ! = null && (! matchValue || (v = node.value) == value || (value ! = null && value.equals(v)))) {// values are consistent or values do not need to be comparedif(node instanceof TreeNode) // if the current node is a TreeNode // prove that the current position of the linked list has become a red-black tree structure // delete the corresponding node by red-black TreeNode ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);else ifTAB [index] = node.next; TAB [index] = node.next; TAB [index] = node.next;elseP next = node.next; p next = node.next; p next = node.next; ++modCount; // The number of key-value pairs currently stored - 1 --size; afterNodeRemoval(node); // return to delete nodereturnnode; }} // If there is no node to delete, null is returnedreturn null;
    }Copy the code
  • callreplace(K key, V value)Operation according to keykeyFind the correspondingkey-valueKey-value pairs, and then replace the corresponding valuesvalue, the following operations are performed:
  1. Call the hash(key) method to calculate the hash value of the key

  2. The getNode method is then called to get the value mapped to the corresponding key.

  3. Records the old value of the element, assigns the new value to the element, returns the old value of the element, or returns NULL if no element is found.

Sample code:

/** * replace the value mapped to the specified key ** @param key corresponds to the key of the element whose value is to be replaced * @param value corresponds to the new value of the element whose value is to be replaced * @return@since 1.8 JDK1.8 new method */ public V replace(K key, V value) {Node<K,V> e; // 1. Call firsthashThe (key) method calculates the keyhashValue // 2. Then call getNode to get the value mapped to the corresponding keyif ((e = getNode(hash(key), key)) ! = null) {// if the corresponding element is found // element oldValue = e.value; // Assign the new value to the element e.value = value; afterNodeAccess(e); // Return the old element valuereturnoldValue; } // If no element is found, null is returnedreturn null;
    }Copy the code

3.

Question 1: As you mentioned above, when storing an element, you compute its hash value to determine its storage location, and then place the element in the corresponding location. What if there is already an element in that location? What about the new element?

Question 2: What is a hash collision (or hash collision)? Why does this happen and how can hash conflicts be resolved?

A:

  • Hash collision: when we call put(K key, V value) to add a key-value pair, where is the key-value pair stored by the perturbation function (key == null)? 0: (h = key.hashCode()) ^ (h >>> 16) Computes the hash value of the key. Then the length of Node

    [] table on the hash value % module to get the specific location. So put(K key, V value) multiple elements, it is possible to calculate the same location. This phenomenon is known as hash collision, or hash collision.
    ,v>

  • For example: The hash value of element A is 9, and that of element B is 17. Hash table Node

    [] table length 8. The storage location of element A is 9%8 = 1, and the storage location of element B is 17%8 = 1. The two elements are stored in table[1], causing hash conflicts.
    ,v>

  • Avoiding hash collisions: Since hash collisions can occur, we should try to avoid them. The key to solving this problem is how to generate the hash value of the element. Java uses “perturbation functions” to generate hash values for elements.

Sample code:

/** * JDK 7hashMethod */ final inthash(int h) {

        h ^= k.hashCode();

        h ^= (h >>> 20) ^ (h >>> 12);
        returnh ^ (h >>> 7) ^ (h >>> 4); } /** * JDK 8hashMethod */ static final inthash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }Copy the code

Java7 does four 16-bit right-shift xor mixes. In Java 8, this step was simplified to do only one 16-bit right-shift xor mixes instead of four, but the principle remains the same. Examples are as follows:

Example of Disturbance Function Execution – Picture from Zhihu

Right shift 16 bits, exactly 32bit half, their own high half and low half do xor, is to mix the original hash code high and low, in order to increase the randomness of the low. In addition, the mixed low level is mixed with some features of the high level, so that the information of the high level is also preserved in a disguised way.

The above perturbation function is explained from: JDK source code HashMap hash method principle?

  • hashConflict resolution: resolutionhashThere are many ways to conflict, common ones are: developing addressing methods,

    Rehash, chain address, public overflow zone (see my article for detailsJAVA Basics – Ask yourself hashCode and equals).HashMapIs to use the chain address methodhashWhen a conflicting element is inserted, the element is inserted into the last digit of the list at this location, forming a single linked list. But because it’s a singly linked list, every time it passeshash % lengthWhen you find elements in that position, you need to go through the list from the beginning, comparing each elementhashValue to find the corresponding element. If there are too many elements in this position, the linked list is too long, the traversal time will be greatly increased, and the time complexity in the worst case isO(N), resulting in low search efficiency. So when the length of the list of locations is greater than or equal to 8,HashMapIt turns the list into a red-black tree, and the worst-case time of a red-black tree is zeroO(logn). In order to improve the search efficiency.

4.

Q: Why does the capacity of a HashMap have to be 2 to the n?

A:

  • When put(K key, V value) is used to add key-value pairs, the location of this element is calculated by the hash % length of Node

    [] table on the hash % module. But the “module” operation consumption is relatively large, through the bit operation H & (length-1) can also get the storage position after taking the module, and the bit operation efficiency is high, but only when the length of length is 2 n power, H & (Length-1) is equivalent to H % length.
    ,v>

  • In addition, when the array length is 2 to the power of n, the probability of the index calculated by different keys is small, so the data distribution in the array is relatively uniform, that is to say, the probability of collision is small. Comparatively, when querying, there is no need to traverse the linked list at a certain position, so the query efficiency is high.

Example:

Hash & (length-1) Operation process.jpg

  • In the figure above, the left two groups have a length of 16 (2 ^ 4) and the right two groups have a length of 15. The hash values of the two groups are 8 and 9.

  • When the length of the array is 15, when the ampersand operation is performed with 1110 (the same is 1, the different is 0), the calculated result is 1000, so they are stored in the same position table[8], which leads to hash conflict, so the query has to go through the linked list and compare hash values one by one, which reduces the efficiency of the query.

  • We can also see that when the array length is 15, the hash value will always be ampersand with 14 (1110), so the last bit will always be 0, and 0001,0011,0101,1001,1011,0111,1101 will never be able to store elements, which is quite a waste of space. Worse, the number of places the array can be used is much smaller than the array length, which further increases the chance of collisions and slows down the efficiency of the query.

  • So,HashMapThe capacity of is 2 to the n, which improves the efficiency of calculating the location of the elements, but also reduceshashThe probability of conflict. Therefore, we useHashMapWhen storing a lot of data, it’s best to specify a container size of 2 to the n, even if we don’t specify 2 to the n,HashMapIt also sets the size of the container to the nearest 2 ^ N of the set number, for example, setHashMapIs 7, thenHashMapThe container size is set to the nearest 2 to the NTH power of 7, which is 8.

The above answers are from: Understand HashMap in depth

Sample code:

/** * returns a ratio of the specified numbercapReturns a power of two sizefor the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }Copy the code

5.

Q: What is the load factor of a HashMap and what does it do?

A: The load factor represents the extent to which the hash table space is used (or the utilization of the hash table space).

  • Examples are as follows: Capacity = 16; Load factor = 0.75; if the number of stored elements size = Capacity 16 * load factor 0.75 = 12, then the number of stored elements size = Capacity 16 * load factor 0.75 = 12. The HashMap expansion mechanism is triggered and the resize() method is called to expand the capacity.

  • The larger the load factor, the higher the load of the HashMap. In other words, it can accommodate more elements. With more elements, the probability of hash collisions will increase, and the linked list will be elongated, which will reduce the query efficiency.

  • When the load factor is smaller, the amount of data in the linked list will be more sparse, which will cause a waste of space, but the query efficiency is high.

  • We can adjust the load factor value as needed when creating the HashMap; If the program is concerned about space overhead and memory is tight, you can increase the load factor appropriately. If the program is more concerned about the time cost, memory is more generous can be appropriate to reduce the load factor. Typically, the default load factor (0.75) seeks a compromise in time and space costs without the programmer changing the value of the load factor.

  • Therefore, if we know the size of the capacity to load key-value pairs when we initialize the HashMap, we can calculate the size of the capacity we need to initialize using size/load factor. InitialCapacity This prevents the HashMap from constantly calling resize() when the number of elements it holds reaches the threshold. Thus, better performance is guaranteed.

6.

Q: Can you tell us the difference between HashMap and HashTable?

A: HashMap and HashTable have the following differences:

1) Overall container structure:

  • Both the key and value of a HashMap can be null. When a HashMap encounters a null key, putForNullKey is invoked to process it, but value is not processed.

  • Neither key nor value of a Hashtable can be null. When Hashtable encounters null, NullPointerException is returned.

2) Capacity setting and expansion mechanism:

  • By default, the initial capacity of a HashMap is 16, and the capacity of the container must be 2 to the power of n. When the container is expanded, it is twice the original capacity.

  • By default, the initial capacity of a Hashtable is 11. During capacity expansion, the capacity of a Hashtable is doubled and then increased by 1. Int newCapacity = (oldCapacity << 1) + 1; .

3) Hash distribution (calculation of storage location) :

  • In HashMap, the hashCode of the key key is first disturbed by the perturbation function to obtain the hash value, and then the storage location of the element is obtained by using hash & (Length-1) instead of modulo.

  • Hashtable is divided by remainder (since its default size is not 2 to the n). Int index = (hash & 0x7fffff) % tab.length; .

  • Since the container capacity of HashMap must be 2 to the power of n, the method of hash & (Length-1) can be used to calculate the position of elements instead of taking modulus to improve the operation efficiency. However, the container capacity of Hashtable is not necessarily 2 to the power of N, so this method cannot be used instead.

4) Thread safety (most important) :

  • A HashMap is not thread-safe and can be made thread-safe by calling synchronizedMap(Map

    m). However, it is inefficient to use, so it is recommended to use ConcurrentHashMap to achieve thread-safety.
    ,v>

  • Hashtable is thread-safe, with synchronized modifier before each operation method, but it is not efficient to run. Therefore, it is recommended to use ConcurrentHashMap to achieve thread-safe operation.

Therefore, Hashtable is a legacy container, and it is recommended to use HashMap if we do not need thread synchronization, or ConcurrentHashMap if thread synchronization is required.

Hashtable source code is no longer analyzed here, if you want to know more about the students, you can refer to this article Hashtable source code analysis

7.

Q: You said that HashMap is not thread-safe. How does it handle multithreading? And under what circumstances can thread insecurity occur?

A:

  • Hashmaps are not thread-safe, and if multiple threads change data to the same HashMap at the same time, data inconsistencies or data contamination can result. If there is a thread unsafe operation, HashMap will throw ConcurrentModificationException prevent abnormal data as possible, when we are on a HashMap traversed, during traversal, we cannot add the HashMap, Delete the change data such as operation, otherwise will be throw ConcurrentModificationException, this is a fail – fast (quick) failure mechanism. When HashMap data is changed by put,remove, etc., the modCount is expectedModCount! = modCount, throw ConcurrentModificationException. If you want thread-safe, consider using ConcurrentHashMap.

  • In addition, when operating HashMap in multiple threads, due to the capacity expansion mechanism, an infinite loop may occur when HashMap calls resize() for automatic capacity expansion.

Due to the lack of time, I will not take you to analyze the cause of the phenomenon of dead loop caused by resize() method. I will add it later when I have time. Please forgive me and you can refer to the following article:

The Java 8 series reimagines HashMap

Talk about how the HashMap thread is unsafe

8.

Q: What objects do we choose as key keys when using HashMap, and why?

A:

  • Mutable object: an object whose state can change after being created. In other words, a mutable object is an object whose hash value can be changed after it is created.

  • When using a HashMap, it is best to choose immutable objects as keys. Immutable types such as String and Integer are very wise to use as keys.

  • If the key object is mutable, then the hash value of the key may change. Using mutable objects as keys in a HashMap causes data loss. Because when we calculate the position and search the corresponding element with hash & (Length-1) modulo operation, the position may have changed, resulting in data loss.

For detailed examples, please refer to: Danger! Use mutable objects as keys in a HashMap

conclusion

  1. HashMap is a storage structure for key-value pairs

    based on the Map interface, allowing null values while being non-ordered and non-synchronous (i.e., thread unsafe). The underlying implementation of HashMap is an array + a linked list + a red-black tree (red-black tree has been added to JDK1.8).
    ,value>

  2. The hash value is obtained after the key is disturbed by the perturbation function, and then the element is located by hash & (Length-1) instead of modulo.

  3. A HashMap uses the hash address method to resolve hash conflicts. When a conflicting element is put in, it is inserted into the last bit of the list at that location to form a single linked list. When a list of existing locations is 8 or larger, HashMap converts the list into a red-black tree to improve lookups.

  4. The capacity of a HashMap is 2 to the power of n, which improves the efficiency of calculating the location of elements and reduces the probability of hash collisions. Therefore, when we use HashMap to store a large amount of data, it is best to specify a container size of 2 to the power of n beforehand. Even if we do not specify 2 to the power of n, HashMap will set the container size to the nearest power of 2 to the set number. For example, if we set HashMap size to 7, The HashMap sets the container size to the nearest 2 to the NTH power of 7, which is 8.

  5. The load factor of a HashMap indicates the degree of hash table space usage (or hash table space utilization). The larger the load factor, the higher the load of the HashMap. In other words, it can accommodate more elements. With more elements, the probability of hash collisions will increase, and the linked list will be elongated, which will reduce the query efficiency. When the load factor is smaller, the amount of data in the linked list will be more sparse, which will cause a waste of space, but the query efficiency is high.

  6. HashMap is not thread-safe, Hashtable is thread-safe. But Hashtable is a legacy container, and we recommend using HashMap if we don’t need thread synchronization, or ConcurrentHashMap if we do.

  7. When HashMap is operated under multiple threads, an infinite loop may occur when HashMap calls resize() for automatic expansion due to the capacity expansion mechanism.

  8. When using a HashMap, it is best to choose immutable objects as keys. Immutable types such as String and Integer are very wise to use as keys.

  • Due to recent job is busy, also have a problem of procrastination attack, so the article hasn’t completed, now finished articles for me, actually is not very good, but is going to start to let people see, learning how to learn together and see what’s wrong, I’ll gradually improving, if this article helpful to you, please give praise, thank you.

Refer to the article

What is the principle of HashMap hash method in Java 8 series? HashMap HashMap load factor Hashtable source code analysis risk! Using mutable objects as keys in a HashMap talk about thread insecurity in a HashMap