preface

HashMap has long been a favorite question for interviewers. In this article, I will also revisit HashMap and analyze HashMap 1.7 and HashMap 1.8 from a source perspective

The preparatory work

Android Studio selects version 1.7, but still displays version 1.8 HashMap, not downloaded version 1.7. The author here gives baidu net disk download address

Link: pan.baidu.com/s/1mxhqVClc… Password: DWPT

Preliminary knowledge foundation

Here is mainly for some unfamiliar with data structure friends to do a simple introduction, if familiar with data structure friends, you can skip this section.

  • Version 1.7HashMapThe data structure used isArray + linked list
  • Version 1.8HashMapThe data structure used isArray + linked list + red-black tree

Do not understand the data structure of partners can go to see (small turtle) data structure and algorithm to fill the basic knowledge.

Version 1.7HashMapIntroduction of the principle

Before looking at the source code, let’s give a brief introduction to the 1.7 version of HashMap, so that we don’t have to look at the source code. In the preliminary knowledge matting, “the data structure of HashMap 1.7 is array + linked list” was introduced.

Given a key, we Hash it to determine which array it should be placed in. 🌰 For example, map.put(“name”,” river ocean “). After the Hash operation, the subscript of name is 0. Then fill value: river ocean with the array 0 below

But the Hash operation shows the same subscript. 🌰 map. The put (” size “, “18 cm”); After the Hash operation, it is found that the subscript of size is also 0. Then the so-called Hash collision occurs. Size is inserted as a linked list before name to form a linked list.

But with so many keys put, subscripts will run out. Then you need a scaling mechanism, which actively increases the size of the array.

The following questions need to be answered in the source code

  • How long should the array be?
  • When will the capacity be expanded?
  • How to expand capacity?
  • How to query the corresponding value by key?

With these questions. Start looking at the source code

When we new oneHashMapWhat happened

Click in to see its no-argument constructor. You can see that the no-parameter constructor called the two-parameter constructor. And pass in two fixed values. These two values are important.

  • DEFAULT_INITIAL_CAPACITYThe length of the array is 16 by default
  • DEFAULT_LOAD_FACTORLoad coefficient, when the array occupancy rate reaches 16*0.75, namely 12, automatic expansion
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; / / value is 16
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
public HashMap(a) {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
Copy the code
public HashMap(int initialCapacity, float loadFactor) {
	// Check that the array length cannot be less than 0
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
	// The length cannot be larger than 1073741824
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // Load factor cannot be less than 0
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
	// Reassign the array length and load factor to the variable
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    // Empty method. Use it for subclasses (LinkedHashMap)
    init();
}
Copy the code

You can see that. New a HashMap does not create an array, but specifies the length of the array, and the load factor.

What happened to put

Continue to look at the source code. Let’s see what happens to put

public V put(K key, V value) {
	// Check whether the array length is an empty array
    if (table == EMPTY_TABLE) {
    	// Create an array whose threshold is known to be 16 when created above
        // See the section "How to create an array" for details on this method
        inflateTable(threshold);
    }
    // Check whether the key is null
    if (key == null)
    	For details, see section "If key is null"
        return putForNullKey(value);
    // Hash operation to obtain the hash value
    int hash = hash(key);
    // Get the index of the array using the hash value
    int i = indexFor(hash, table.length);
    // Find the array with subscript I. Get his list structure
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        // Same hash value and same key overrides old value,
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
	
    modCount++;
    // If you can't find add a detail check the "addEntry" section
    addEntry(hash, key, value, i);
    return null;
}
Copy the code

How to create an array

InflateTable (threshold) creates an array. So how is it created

private void inflateTable(int toSize) {
    // Pass in the specified initialized array length, that is, 16 returns a capacity
    // Capacity must be a multiple of 16 (16, 32, etc.)
    int capacity = roundUpToPowerOf2(toSize);
    // Get the size of the threshold. That is, array length * load factor 16*0.75 = 12
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    // Create an array of length 16
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
Copy the code

As you can see, arrays are actually created by putting elements.

If key is null

The appeal finds what happens if the key is null

private V putForNullKey(V value) {
	// Find the list with subscript 0. Follow the structure of the linked list one by one
    for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
    	// If the key is null
        if (e.key == null) {
        	//key == null assigns the corresponding value to the local variable oldValue
            V oldValue = e.value;
            // reassign its value to the value entered by the new setting and override the original value
            e.value = value;
            // The null method is used by subclasses (LinkedHashMap)
            e.recordAccess(this);
            // Return parameters
            returnoldValue; }}// The number of changes +1
    modCount++;
    // Add an element to the first array if not found
    // See the "addEntry" section for details
    addEntry(0.null, value, 0);
    return null;
}
Copy the code

addEntry

You can see addEntry in both sections “If key is null” and “What happens to put”.

void addEntry(int hash, K key, V value, int bucketIndex) {
    // The quantity exceeds the set threshold, i.e. 16*0.75 = 12
    // And there are values in the array. Not null
    // The capacity expansion requirement is met
    if ((size >= threshold) && (null! = table[bucketIndex])) {// For details about this method, see section "Resize"
        resize(2 * table.length);
        hash = (null! = key) ? hash(key) :0;
        // Use hash to select an array subscript from 0-15
        bucketIndex = indexFor(hash, table.length);
    }
   // If the requirements are not met, create a new Entry. For details about this method, see section createEntry
    createEntry(hash, key, value, bucketIndex);
}
Copy the code

resize

This method is the code for the actual expansion.

void resize(int newCapacity) {
    Entry[] oldTable = table;
    // Get the original array length. Note that the first time should be 16
    int oldCapacity = oldTable.length;
    // Cannot exceed the maximum value
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
    // Create a new array with a length of 32
    Entry[] newTable = new Entry[newCapacity];
    // Migrate data from the old array to the new array see the "Transfer" section for details
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    // Old pointer to new array
    table = newTable;
    // The new threshold is 24
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code

transfer

As you saw in the resize section, this method migrates data from the old array to the new array

void transfer(Entry[] newTable, boolean rehash) {
  int newCapacity = newTable.length;
  // go through the number group
  for (Entry<K,V> e : table) {
      // Iterate over the list
      while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
        	 // Double hash operation
              e.hash = null == e.key ? 0 : hash(e.key);
          }
          // Use hash to select an array subscript from 0-31
          inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

createEntry

void createEntry(int hash, K key, V value, int bucketIndex) {
    // Find the original data
    Entry<K,V> e = table[bucketIndex];
    // Create new data.
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    //size+1 The size in this section is the size used to determine whether the threshold is exceeded in the addEntry section
    size++;
}
Copy the code

Size +1 for createEntry and -1 for map.remove(“key”) see “removeEntryForKey” for details

🌰 Example

First put(“name”,” river ocean “) then put(“age”,”22″). Let’s say they have the same hash. And they’re all in the 0th bit of the array. So now his data structure looks like the figure below. Notice that we’re using header interpolation where next points to the next element. The reason for using the header method instead of the tail method is that the latest element to be inserted may be searched by the developer first. Using the header method puts the latest element in front, so if you find a new element, the search will be faster. Remember that the search time complexity of the linked list is O(n) as mentioned in the preliminary knowledge foiding section. Lookups are slow

removeEntryForKey

final Entry<K,V> removeEntryForKey(Object key) {
      if (size == 0) {
          return null;
      }
      // Hash to get the hash value
      int hash = (key == null)?0 : hash(key);
      // Hash to the array subscript
      int i = indexFor(hash, table.length);
      // Find the corresponding array and get the linked list
      Entry<K,V> prev = table[i];
      Entry<K,V> e = prev;
	  // Iterate over the list
      while(e ! =null) {
      	  // Get the pointer to the next array
          Entry<K,V> next = e.next;
          Object k;
          // Determine the hash value to determine the key to ensure that the key is found
          if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
              modCount++;
              //size - 1
              size--;
              // If there is only one element in the list. Next is null
              if (prev == e)
              	  / / is null
                  table[i] = next;
              else
              	  // Point to the next element
                  prev.next = next;
              / / short method
              e.recordRemoval(this);
              return e;
          }
          prev = e;
          e = next;
      }
      return e;
  }
Copy the code

An example 🌰

Continuously put name age cool, assuming that the hash calculation results are the same. The linked list looks like this.

map.put("name"."River ocean");
map.put("age"."22");
map.put("cool"."Handsome");
Copy the code

If age needs to be deleted, you only need to change next corresponding to cool into name. Modify as follows to complete the operation of Remove.

Query element

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}
Copy the code
final Entry<K,V> getEntry(Object key) {
    // Return size without adding any elements. CreateEntry will automatically +1
    if (size == 0) {
        return null;
    }
    // Hash to get the hash value
    int hash = (key == null)?0 : hash(key);
    // Hash to the array subscript. Iterate over the linked list with the corresponding subscript
    for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
         e = e.next) {
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

Version 1.8 HashMap

Take a look at HashMap 1.8

Initialize the

In 1.7, we found that it sets two important parameters

  • DEFAULT_INITIAL_CAPACITYThe length of the array is 16 by default
  • DEFAULT_LOAD_FACTORLoad coefficient, when the array occupancy rate reaches 16*0.75, namely 12, automatic expansion

In 1.8, only one load factor is set. The default is 0.75

public HashMap(a) {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
Copy the code

put

Put in 1.8 is also different from PUT in 1.7

public V put(K key, V value) {
    // See "putVal" for details
    return putVal(hash(key), key, value, false.true);
}
Copy the code

The first difference is the hash algorithm

putVal

The 1.8 PUT method is old. What bad code readability! Bad review

The first time you put an element. The default array is null and the threshold is 0. So use the resize() method to create an array of length 16. In addition, the threshold and other information are recorded. After obtaining the array, the key is hashed to obtain the Hash value, which index is in the array through the Hash value. Which bucket is in. Confirmed the location of the barrel. A new Node is created corresponding to Entry in 1.7 as the first element in the linked list.

When put elements of the second time, still barrel is obtained by the key position, and then contrast barrels of the corresponding list of an element value, if it is the same elements, value will be replaced the, put the key is the same, if the first element is not the same, then determine whether a tree structure, if not a tree structure traversal of the list in barrels. If a match is found, the put key is the same. Replace the value with the new value. If neither key is found, the put key is a new key.

After using the tail insertion method, determine whether the length of the list is greater than 7. If so, turn the list into a tree structure. When the number of nodes in the tree is less than 6, the tree degenerates into a linked list structure. When there are too many nodes, red-black tree can find nodes more efficiently. After all, red-black trees are binary search trees

At the end of put, check whether the number of maps exceeds the threshold. If the number exceeds the threshold, the system needs to be expanded

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // Declare variables. The source code is written on one line. Here I divide it into three lines for better viewing
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, i;
    // assign table to local variable to see if it is null
    // copy the length of TAB (table) to n to check if it is 0
    if ((tab = table) == null || (n = tab.length) == 0) {// [first time] resize() will get the new array assigned to TAB.
        // Copy the length of the newly obtained data to n
        n = (tab = resize()).length;
    }
    // Get the index of the array using the hash value. Copy the subscript to I
    // assign the array subscript I to p
    // check whether p is null
    // [first time] 😤 This code is not readable !!!!!
    if ((p = tab[i = (n - 1) & hash]) == null)
    	// Create a new object if it is null
        tab[i] = newNode(hash, key, value, null);
    else {
        // The list element in the array is not null
        Node<K,V> e; 
        K k;
        Check whether the hash value and key are the same as 1.7
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k)))){
            // copy to local variable e
            e = p;
        // Whether the attributes of the tree node are met. 1.8 Red-black tree in Data Structure
        }else if (p instanceof TreeNode){
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        }else {
        	// Start the loop by adding a new element to the list instead of the previous key
            for (int binCount = 0; ; ++binCount) {
            	// Create a new Node if the list is not found
                if ((e = p.next) == null) {
                	// Create a new Node using a tail plug. Insert to the end
                    p.next = newNode(hash, key, value, null);
                    // The tree conversion threshold is reached (8-1) = 7
                    if (binCount >= TREEIFY_THRESHOLD - 1) {// The list becomes a tree
                        treeifyBin(tab, hash);
                    }
                    break;
                }
                // Break the following (e! = null)
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))){
                    break; } p = e; }}E is not null if it happens to be in the same array, or null if it is not in the same array
        // The overwrite operation is performed by putting the previous key
        if(e ! =null) { 
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null) {// Set the new value to the value field
                e.value = value;
            }
            / / short method
            afterNodeAccess(e);
            // Return the old value
            return oldValue;
        }
    }
    ++modCount;
    // Determine whether to expand the code for the first time to determine whether greater than 12 (array length * load factor)
    if (++size > threshold){
    	// Capacity expansion conditions are met
        resize();
    }
    afterNodeInsertion(evict);
    return null;
}
Copy the code

resize

This method is mainly for expansion. It’s still longer. To read this code, look at the first entry. Let’s take a look at “meet capacity expansion conditions” and then take a look at “migrate data”.

For the first time, the array will be 16 bytes long. Then, if the threshold is greater than the array length * load factor, the data in the old array will be migrated to the new array. There is one less quadratic hash process compared to 1.7.

final Node<K,V>[] resize() {
      // oldTab points to table but table is null so oldTab is null
      Node<K,V>[] oldTab = table;
      // [first] put table null; OldTab is also null so oldCap = 0
      int oldCap = (oldTab == null)?0 : oldTab.length;
      // [first] The put threshold is also 0
      int oldThr = threshold;
      int newCap, newThr = 0;
      // [meet the expansion condition] after the entry is not 0, the value is 16
      if (oldCap > 0) {
      	  // [Meets capacity expansion conditions] Specifies the maximum value
          if (oldCap >= MAXIMUM_CAPACITY) {
              threshold = Integer.MAX_VALUE;
              return oldTab;
          NewCap = oldCap * 2
          }else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                   oldCap >= DEFAULT_INITIAL_CAPACITY){
              // [Capacity expansion condition met] The new threshold is twice the old threshold
              The value is 16 for the first time and 32 for the second time
              newThr = oldThr << 1; }}else if (oldThr > 0){
          newCap = oldThr;
      }else {               
          Array length 16 Threshold 16*0.75 = 12
          newCap = DEFAULT_INITIAL_CAPACITY;
          newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
      }
      if (newThr == 0) {
          float ft = (float)newCap * loadFactor;
          newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
      }
      // [first time] set threshold to 12
      threshold = newThr;
      // Create an array of length 16
      // A new array of length *2 will be created
      Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
      // add a pointer to the table
      table = newTab;
      // The first oldTab must be null, so it will not go to the following code
      if(oldTab ! =null) {
          // Walk through the old array
          for (int j = 0; j < oldCap; ++j) {
              Node<K,V> e;
              // Get the JTH array and assign the array to e, check whether e is null
              if((e = oldTab[j]) ! =null) {
                  oldTab[j] = null;
                  If next is null, there is only one element in this array
                  if (e.next == null) {// Create a new array and point the new array directly to e, indicating that e is directly added to the new array
                      newTab[e.hash & (newCap - 1)] = e;
                  } else if (e instanceof TreeNode){
                  	  // Migrate data to a tree structure.
                      ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                  } else { 
                  	   If it is a linked list, and there is more than one data in the list
                      Node<K,V> loHead = null, loTail = null;
                      Node<K,V> hiHead = null, hiTail = null;
                      Node<K,V> next;
                      // [migrate data] iterate through the list, where two arrays are created. High order array, low order array,
                      do {
                          next = e.next;
                          if ((e.hash & oldCap) == 0) {
                              if (loTail == null)
                                  loHead = e;
                              else
                                  loTail.next = e;
                              loTail = e;
                          }
                          else {
                              if (hiTail == null)
                                  hiHead = e;
                              elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                      // The resulting high-order array and low-order array all point to the newly created array
                      if(loTail ! =null) {
                          loTail.next = null;
                          newTab[j] = loHead;
                      }
                      if(hiTail ! =null) {
                          hiTail.next = null;
                          newTab[j + oldCap] = hiHead;
                      }
                  }
              }
          }
      }
      return newTab;
  }
Copy the code

Fail — fast

HashMap traversal uses a fast failure mechanism, common in Java unsafe collections, that allows concurrent modification exceptions to be raised if a thread modifs, deletes, or adds to a collection during traversal.

It is implemented by saving a copy of modCount before traversal and comparing the current modCount with the saved modCount each time the next element to be traversed is fetched.

Rapid failure can also be regarded as a kind of security mechanism, so that in a multithreaded operation is not a collection of security, due to the rapid failure mechanism, throws an exception ConcurrentModificationException

Why is 1.8 more efficient than 1.7

Look at it in two places

  • Hash algorithm: in JDK1.8 implementation. The algorithm of high order operation is optimized
  • Introducing red-black trees: when there is a large amount of data in the bucket, red-black trees are binary search trees, which are more efficient than linked lists

Why is HashMap thread unsafe

  • In multithreading, it’s possible to formCircular linked listLead toInfinite LoopFor details, please refer toThe Java 8 series reimagines HashMapThe sample
  • Deleting or modifying data may cause overwrite problems and result in inaccurate data.

Thread-safe HashMap

  • ConcurrentHashMap

Using segment + Lock to optimize the strategy. See Map Overview (3) for details: Thoroughly Understanding ConcurrentHashMap

  • Hashtable

Achieve thread safety by locking. The older classes provided by Java 1.1 are not recommended because they are inferior to the alternatives in terms of performance and usage

  • SynchronizeMap

He’s Collections. SynchronizeMap () method returns the object, he realize thread safe method and the Hashtable consistent, lock directly

public static <K,V> Map<K,V> synchronizedMap(Map<K,V> m) {
    return new SynchronizedMap<>(m);
}
Copy the code

What is a hash table

A hash table, also known as a hash table, accesses data structures stored in memory based on keys. The HashMap shown above uses keys to obtain hash values. The array is a hash table, and the method to obtain hash values is also known as a hash function

Why load factor is 0.75 by default

If the value is 1, the expansion cannot be triggered until it is completely filled, which is a big problem because Hash conflicts are unavoidable. When the load factor is 1.0, that means a lot of Hash collisions, and the underlying red-black tree becomes extremely complex. It is extremely detrimental to query efficiency. In this case, time is sacrificed to ensure space utilization

At a load factor of 0.5, this means that when the array is half full, since there are fewer elements to fill, Hash collisions are reduced, and the length of the underlying list or the height of the red-black tree is reduced. Query efficiency will increase. However, the utilization of space is greatly reduced, and the storage of 1M data now means that 2M space is needed. 0.75 is a tradeoff between time and space

Why is the array size an integer multiple of 16

So let’s see how do we get the index from the key, hash the key, hash the hash value and the length of the array minus 1

The hash value below 🌰 is &15. If you look at the result, you can actually see the last 4 digits, because the next 4 digits are all 1’s,

0000 0000 0000 1010 1001 0000 0000 1111Copy the code

No matter what hash value is used for &, the value must be the last four bits, that is, the interval [0,15]

So let’s do the same thing with base 2 32 minus 1

0000 0000 0001 1111
Copy the code

Let’s look at base 64 minus 1

0000 0000 0011 1111
Copy the code

These values are evenly distributed in the range 0, array length -1, when the am& is performed on the array length -1

Why did 1.7 head plug be changed to 1.8 tail plug

In JDK1.7, after the rehash of each element is inserted into the head of the index of the new array, so the order of the original list is A->B->C. After the rehash, the order of the elements may be C->B->A. In concurrent scenarios, a circular linked list may occur during capacity expansion. In JDK1.8, the order of elements inserted from the beginning to the end remains the same, avoiding the situation of circular linked lists.

reference

  • The Java 8 series reimagines HashMap
  • Overview of Map (iii) : Thoroughly understand ConcurrentHashMap
  • Why is the initial load factor for HashMap 0.75? This article tells you the answer in the most popular way
  • Let’s go into Dachang series -HashMap