Overview of HashMap

A Map is an abstract interface for key-value mappings that do not contain duplicate keys, that is, one Key corresponds to one Value. HashMap is an important member of the Java Collection Framework and one of the most commonly used of the Map family (shown below). Simply put, a HashMap is an implementation of the Map interface based on a hash table, in the form of key-value, that is, the stored object is an Entry (containing both a Key and a Value). In a HashMap, it calculates the location of a key-value based on the hash algorithm and accesses it quickly. In particular, a HashMap allows at most one Entry to have a Null key (multiple entries will be overwritten), but allows multiple entries to have Null values. In addition, a HashMap is an asynchronous implementation of a Map.

It is important to note that although the container claims to store Java objects, it does not actually put Java objects into the container, only keeps references to those objects in the container. That is, the Java container actually contains reference variables that point to the Java objects we are actually saving.

HashMap is defined in the JDK

HashMap implements the Map interface and inherits the AbstractMap abstract class, which defines key-value mapping rules. Similar to the AbstractCollection abstract class in the Collection family, the AbstractMap abstract class provides a backbone implementation of the Map interface to minimize the effort required to implement the Map interface. HashMap is defined in the JDK as:

public class HashMap<K.V>
    extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable{... }Copy the code

C

There are four constructors provided by HashMap. The constructor that takes no arguments by default and the constructor that takes Map are recommended implementations of the Java Collection Framework specification. The other two constructors are provided by HashMap.

1, a HashMap ()

This constructor is intended to construct an empty HashMap with > default initial capacity (16) and default load factor (0.75). It is recommended by the Java Collection Framework specification and the source code is as follows:

    ** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */ Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75)
   public HashMap(a) {
   		// Load factor: Measures how much space is used in a hash table
        this.loadFactor = DEFAULT_LOAD_FACTOR; 

        // Threshold for HashMap expansion, which is equal to the capacity of the HashMap multiplied by the load factor
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);

        // The underlying implementation of a HashMap is still an array, except that each item of the array is a chain
        table = new Entry[DEFAULT_INITIAL_CAPACITY];

        init();
    }
Copy the code
2, HashMap(int initialCapacity, float loadFactor)

This constructor is intended to construct an empty HashMap that specifies an initial capacity and a specified load factor. The source code is as follows:

	** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */ Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75)
 public HashMap(int initialCapacity, float loadFactor) {
 		// The initial capacity cannot be less than 0
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
         // The initial capacity cannot exceed 2^30
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        // Load factor cannot be less than 0
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        // Load factor
        this.loadFactor = loadFactor;
        // Set the HashMap capacity limit. When the HashMap capacity reaches this limit, the HashMap capacity will be expanded automatically
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code
3, a HashMap (int initialCapacity)

This constructor is intended to construct an empty HashMap that specifies the initial capacity and default load factor (0.75). The source code is as follows:

	/** * Constructs an empty HashMap with the specified initial * capacity and the default load factor (0.75). */ Constructs an empty HashMap with the specified initial * capacity and the default load factor (0.75)
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
Copy the code
4, a HashMap (Map <? extends K, ? extends V> m)

This constructor is intended to construct a HashMap that has the same mapping as the specified Map, with an initial capacity of at least 16 (depending on the size of the specified Map) and a load factor of 0.75, as recommended by the Java Collection Framework specification.

	/** * Constructs a new HashMap with the same mappings as the specified Map. * The HashMap is Created with default load factor (0.75) and an initial capacity sufficient to hold the mappings in the specified Map. */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

	/** * Implements Map.putAll and Map constructor */
    final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
        int s = m.size();
        if (s > 0) {         
            if (table == null) { // pre-size
            	// The initial capacity is not less than 16
                float ft = ((float)s / loadFactor) + 1.0 F;
                int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                         (int)ft : MAXIMUM_CAPACITY);
                if (t > threshold)
                    threshold = tableSizeFor(t);
            }
            else if (s > threshold)
                resize();
            for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
                K key = e.getKey();
                V value = e.getValue();
                putVal(hash(key), key, value, false, evict); }}}Copy the code

Here, we mention two very important parameters: initial capacity and load factor, which are important parameters that affect the performance of a HashMap. Capacity refers to the number of buckets in the hash table (the size of the table array). The initial capacity is the number of buckets when the hash table is created. Load factor is a measure of how full a hash table can be before its capacity is automatically increased. It measures how much space is used in a hash table. A larger load factor means that the hash table is more loaded, and vice versa.

For a hash table using the zipper method (mentioned below), the average time to find an element is O(1+a), where a refers to the length of the chain and is a constant. In particular, if the load factor is larger, the utilization of space will be more full, but the search efficiency will be lower; The smaller the load factor, the more sparse the data in the hash table will be and the more serious the waste of space will be. The default load factor is 0.75, which is a tradeoff between time and space costs, and we generally don’t need to change it.

4. Data structure of HashMap

1. Related concepts of hashing

A Hash is a Hash algorithm that transforms an input of any length (also known as a pre-image, or pre-image) into a fixed-length output (usually an integer), which is the Hash value. This transformation is a compression mapping, that is, the space of the hash value is usually much smaller than the space of the input. Different inputs may be hashed into the same output, making it impossible to determine the input value uniquely from the hash value. Simply put, it is a kind of information digest function that compresses the message of arbitrary length to a certain fixed length.

2. Application of hash: data structure

The memory structure and principles of HashMap, as well as thread safety, are hot topics for interviews. Java data structure can be used to solve the basic array + linked list.

  • Advantages and disadvantages of arrays: Easy to find with subscript indexes, but difficult to insert or delete an element in an array.
  • Advantages and disadvantages of linked list: because in the linked list to find an element needs to traverse the linked list to find, and insert, delete fast. Therefore, linked lists are suitable for fast insert and delete scenarios, not lookup scenarios.

HashMap is a combination of the advantages of the above two data structures. HashMap consists of Entry array + linked list, as shown in the figure below:

As we can see from the figure above, the left-hand side is clearly an array, and each member of the array is a linked list. All elements contained in the data structure contain a pointer for linking between elements. We assign elements to linked lists based on their characteristics, and in turn we find the correct list and the correct element from the linked list. Among them, the method of calculating the subscript of element array according to element characteristics is hashing algorithm.

From the figure above, we can see that a HashMap is composed of an array of entries + a linked list. Each element in a 16-length array stores the head of a linked list. So what are the rules for how these elements are stored in the array? It is usually obtained by hash(key)%len, which is the hash of the element’s key modulo the length of the array. For example, in the hash table above, 12%16=12,28%16=12,108%16=12,140%16=12. So 12, 28, 108, and 140 are all stored at 12 subscript.

3. Data structure of HashMap

As we know, the two most commonly used structures in Java are arrays and linked lists. Almost all data structures can be combined with these two structures, and HashMap is a typical example of this application. In fact, a HashMap is a linked list array, and its data structure is as follows:

From the figure above, we can see visually that the underlying implementation of HashMap is still an array, except that each item of the array is a chain. The parameter initialCapacity represents the length of the array, which is the number of buckets. In section 3 we looked at the source of the default constructor for HashMap:

    ** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */ Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75)
   public HashMap(a) {
   		// Load factor: Measures how much space is used in a hash table
        this.loadFactor = DEFAULT_LOAD_FACTOR; 

        // Threshold for HashMap expansion, which is equal to the capacity of the HashMap multiplied by the load factor
        threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);

        // The underlying implementation of a HashMap is still an array, except that each item of the array is a chain
        table = new Entry[DEFAULT_INITIAL_CAPACITY];

        init();
    }
Copy the code

Each time a HashMap is created, it initializes a table array of Entry type. The Entry type is defined as follows:

static class Entry<K.V> implements Map.Entry<K.V> {

    final K key;     // The key of the key-value pair
    V value;        // The value of the key-value pair
    Entry<K,V> next;    // Next node
    final int hash;     The return value of the hash(key.hashcode ()) method

    /** * Creates new entry. */
    Entry(int h, K k, V v, Entry<K,V> n) {     // Entry's constructorvalue = v; next = n; key = k; hash = h; }... }Copy the code

Entry is the internal class of HashMap, realizing the map. Entry interface, which contains four attributes: key, value, next node, and hash value. In fact, Entry is the building block of a hash table, the concrete form of the elements that the hash table stores.

Quick access to HashMap

The two most commonly used operations in a HashMap are PUT (Key,Value) and get(Key). We all know that a Key in a HashMap is unique, but how is it guaranteed to be unique? The first thing that comes to mind is equals, and yes, that works, but as you add more elements, put and get become less and less efficient, so the time is O(n). That is, if a HashMap has 1,000 elements, then putting would require 1,000 comparisons, which would be time-consuming and far from the quick access of a HashMap. In practice, HashMap rarely makes use of equals because it manages all elements in a hash table, using hash algorithms to quickly access elements. When we call the put method to store a value, the HashMap first calls the Key’s hashCode method, and then obtains the Key hashCode based on that, using the hashCode to quickly find a bucket, which can be called bucketIndex. If two objects have different hashcodes, then equals must be false; Otherwise, equals is not set to true if the hashcodes are the same. So, in theory, hashCode could have a collision. When a collision occurs, the elements stored in the bucketIndex bucket are fetched and compared one by one using hashCode() and equals() to determine whether the Key already exists. If it already exists, it replaces the old Value with the new Value and returns the old Value. If it does not exist, the new Key/Value pair <Key, Value> is stored in the bucket. Therefore, in a HashMap, equals() is only used when hash codes collide.

Let’s look at the implementation of HashMap access using JDK source code.

1. Storage implementation of HashMap

In a HashMap, key-value pairs are stored using the put(key,vlaue) method. The source code is as follows:

 	/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for  the key, the old * value is replaced. *@param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with key, or null if there was no mapping for key.
     *  Note that a null return can also indicate that the map previously associated null with key.
     */
    public V put(K key, V value) {

        // When the key is null, putForNullKey is called and the key-value pair is saved to the first location of the table
        if (key == null)
            return putForNullKey(value); 

        // Calculates the hash value based on the key's hashCode
        int hash = hash(key.hashCode());             / / -- -- -- -- -- -- -- (1)

        // Calculate the location of the key-value pair in the array (which bucket)
        int i = indexFor(hash, table.length);              / / -- -- -- -- -- -- -- (2)

        // Iterate over the ith bucket of the table to find where the key is saved
        for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {      / / -- -- -- -- -- -- -- (3)
            Object k;
            // Determine if there is a mapping on the chain with the same hash value and the same key value. If there is, overwrite value directly and return the old value
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;    // Return the old value
            }
        }

        modCount++; // Increase the number of changes by 1, fast failure mechanism

        // This map is not in the original HashMap. Add it to the header of the chain
        addEntry(hash, key, value, i);            
        return null;
    }
Copy the code

Using the above source code, we can clearly see how HashMap saves data. First, check whether the key is null. If it is null, call putForNullKey directly. If it is not empty, the hash value of the key is computed and the index position in the table array is searched based on the hash value. If the table array has an element in that position, the index position is searched for whether the same key exists. If there is an element in the table array, the value of the original key is overwritten; otherwise, the element is stored in the header (the first element is stored at the end of the chain). In addition, if the table has no elements there, it is saved directly. This may seem like a simple process, but there are a lot of things to think about.

First look at (3) in the source code, where the reason for iteration is to prevent the existence of the same key value. If two hash values (keys) are found to be the same, the HashMap does this by replacing the old value with the new value. There is no handling of keys, which explains why there are no two identical keys in the HashMap.

1). Special handling of NULL keys putForNullKey()

  /** * Offloaded version of put for null keys */
    private V putForNullKey(V value) {
        // 若key==null,则将其放入table的第一个桶,即 table[0]
        for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {   
            if (e.key == null) {   // If a key with a null key already exists, the value is replaced and the old value is returned
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;        // Fail quickly
        addEntry(0.null, value, 0);       // Otherwise, add it to the bucket of table[0]
        return null;
    }
Copy the code

As you can see from the above source code, a HashMap can hold a unique key-value pair with a NULL key. If you add a NULL key-value pair to it again, its original value will be overwritten. Also, if there is a NULL key-value pair in the HashMap, it must be in the first bucket.

2). Hash strategy in HashMap (algorithm)

In the source code for the put(key,vlaue) method above, we have identified the hash strategy in the HashMap (i.e. (1), (2)). The hash() method is used to recalculate the hashCode of the key. The indexFor() method is used to generate the insertion position of this Entry object. When the calculated hash value is ampersand (length-1) of the hashMap, a value in the range [0, length-1] is obtained. In particular, the more evenly distributed this value is, the more space utilization and access efficiency of the HashMap will be.

    /** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

As the JDK officially describes this method, the use of the hash() method to recalulate an object’s hashCode is intended to prevent a poor-quality implementation of the hashCode() function. Since the length of the supporting array of a hashMap is always a power of 2, the right shift can make the lower-order data as different as possible, thus making the hash value as evenly distributed as possible.

After calculating the hash value of the Key using the hash() method above, how can we ensure that the elements are evenly distributed in each bucket of the table? HashMap (indexFor()) = indexFor(); HashMap = indexFor();

 /** * Returns index for hash code h. */
    static int indexFor(int h, int length) {
        return h & (length-1);  // Function is equivalent to modulo operation, but this way is more efficient
    }
Copy the code

We know that the underlying array length of a HashMap is always 2 to the n. When length is 2 to the n, h&(Length-1) is equivalent to modulating Length and is much faster than modulating length directly, which is an optimization of the speed of HashMap.

In summary, the hash() and indexFor() methods above serve only one purpose: to ensure that elements are evenly distributed across each bucket of the table to maximize space.

3). Add key-value pairs to HashMap: addEntry()

    /** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if Appropriate. * * Subclass overrides this to alter the behavior of put method. * * always add a new element to the head of the list */
    void addEntry(int hash, K key, V value, int bucketIndex) {

        // Get the list at bucketIndex
        Entry<K,V> e = table[bucketIndex];

        // Link the newly created Entry to the head of the list at bucketIndex
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);

        // If the number of elements in the HashMap exceeds the threshold, the capacity is doubled
        if (size++ >= threshold)
            resize(2 * table.length);
    }
Copy the code

Through the above source code we can clearly understand the timing of the chain. A HashMap always adds a new Entry object to a bucketIndex. If a bucketIndex already has an Entry object, the new Entry object points to the original Entry object and forms a new Entry chain with it as its header. However, if the bucketIndex has no original Entry object, the new Entry object will point to NULL, creating a new Entry chain of length 1. A HashMap always adds a new element to the head of a list. In addition, if the number of elements in a HashMap exceeds the threshold, it will be expanded. In general, the capacity will be doubled.

4). HashMap: resize()

As the number of elements in a HashMap increases, the probability of collisions will increase and the length of subchains generated will become longer and longer, which is bound to affect the access speed of HashMap. To ensure the efficiency of HashMap, the system must expand at a critical point, where the number of elements in HashMap is equal to threshold (table array length * load factor). I have to say, however, that scaling is a time-consuming process, because it requires recalculating the positions of these elements in the new TABLE array and copying them. Therefore, if we can predict the number of elements in a HashMap in advance, the preset number of elements when constructing a HashMap can effectively improve the performance of the HashMap. Let’s look directly at the source code:

    /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;

        // If oldCapacity has reached its maximum value, set threshold to integer.max_value
        if (oldCapacity == MAXIMUM_CAPACITY) {  
            threshold = Integer.MAX_VALUE;
            return;             // Return directly
        }

        // Otherwise, create a larger array
        Entry[] newTable = new Entry[newCapacity];

        // Rehash each Entry into the new array
        transfer(newTable);

        table = newTable;
        threshold = (int)(newCapacity * loadFactor);  // Reset the threshold
    }
Copy the code

5). HashMap rehash: Transfer ()

Rehash is a process of recalculating the position of the elements in the original HashMap in the new table array and copying them.

/** * Transfers all entries from current table to newTable. */
    void transfer(Entry[] newTable) {

        // Assign the original array table to the array SRC
        Entry[] src = table;
        int newCapacity = newTable.length;

        // Add each chain in the array SRC back to newTable
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            if(e ! =null) {
                src[j] = null;   / / SRC recycling

                // Add each element of each chain in turn to the corresponding bucket in newTable
                do {
                    Entry<K,V> next = e.next;

                    // e.hash refers to the return value of hash(key.hashcode ());
                    // Calculate the position in newTable, noting that elements originally in the same sliver chain may be assigned to different subchains
                    int i = indexFor(e.hash, newCapacity);   
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
                } while(e ! =null); }}}Copy the code

In particular, in the process of rehashing, Entry objects belonging to one bucket may be divided into different buckets. Because the capacity of HashMap changes, the value of H &(Length-1) will also change accordingly. In extreme cases, rehashing is meaningless if Entry objects belonging to the same bucket still belong to the same bucket.

2. Reading implementation of HashMap

Reading a HashMap is relatively simple compared to storing it. A HashMap uses the hash value of the key to locate a specific bucket in the table array, and then returns the value corresponding to the key.

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        // If the value is null, call getForNullKey to return the corresponding value
        if (key == null)
            // Find the mapping from the first bucket of the table whose key is null; If no, return NULL
            return getForNullKey();  

        // Calculates the hash code based on the key's hashCode value
        int hash = hash(key.hashCode());
        // Find the bucket in the table array
        for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
             e = e.next) {
            Object k;
            // If the search key is the same as the search key, the corresponding value is returned
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }
Copy the code

Here, the value can be quickly obtained according to the key, which is not only inseparable from the data structure of the HashMap, but also closely related to Entry. As mentioned earlier, the stored procedure of a HashMap does not store keys and values separately, but as a whole key-value, which is an Entry object. In particular, in Entry objects, value is subordinate to key.

GetForNullKey (); HashMap; getForNullKey();

/** * Offloaded version of get() to look up null keys. Null keys map * to index 0. This null case is split out into separate methods * for the sake of performance in the two most commonly used * operations (get and put), but incorporated with conditionals in * others. */
    private V getForNullKey(a) {
        // If a NULL pair exists, it must be in the first bucket
        for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        // If a NULL pair does not exist, NULL is returned
        return null;
    }
Copy the code
3, HashMap access summary

During the storage process, the system locates the bucket in the table array according to the hash value of the key, and puts the bucket in the corresponding linked list header. During the fetch process, the hash value of the key is also used to locate the bucket in which the Entry is in the table array, and then the bucket is searched and returned.