Understanding HashMap (JDk7)

An illustration of the storage structure in a HashMap

The jdk1.7 HashMap is implemented using an array + a single linked list. Although the hash function is defined to avoid collisions, there are still two different keys that are calculated to be in the same position in the array because of the limited length of the array.

It can also be found from the simple diagram above that if there are too many nodes in the list, it is obviously too inefficient to search through key values in turn. Therefore, it is improved in 1.8 by adopting array + list + red-black tree. When the length of the list exceeds the threshold of 8, the list is transformed into red-black tree. For details, see the in-depth understanding of JDK8 HashMaps I summarized in my last article

You also know from the figure above that virtually every element is an Entry type, so let’s take a look at what attributes are in an Entry (in 1.8, Entry is renamed Node, and map.entry is also implemented).

// The Node in the hash tag, Node, implements map.entry
static class Entry<K.V> implements Map.Entry<K.V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;
	//Entry constructor, which requires the hash, key, value, and next of the key to point to
    Entry(int h, K k, V v, Entry<K,V> n) {
        value = v;
        next = n;
        key = k;
        hash = h;
    }

    public final K getKey(a) { return key; }

    public final V getValue(a) { return value; }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }
    / / the equals method
    public final boolean equals(Object o) {
        if(! (oinstanceof Map.Entry))
            return false;
        Map.Entry e = (Map.Entry)o;
        Object k1 = getKey();
        Object k2 = e.getKey();
        if(k1 == k2 || (k1 ! =null && k1.equals(k2))) {
            Object v1 = getValue();
            Object v2 = e.getValue();
            if(v1 == v2 || (v1 ! =null && v1.equals(v2)))
                return true;
        }
        return false;
    }
	// Override the Object hashCode
    public final int hashCode(a) {
        return Objects.hashCode(getKey()) ^ Objects.hashCode(getValue());
    }

    public final String toString(a) {
        return getKey() + "=" + getValue();
    }

	// Put (k, v) is called if the keys are the same, that is, if the values in the Entry array are overwritten.
    void recordAccess(HashMap<K,V> m) {}// This method is called whenever an entry is removed from the table
    void recordRemoval(HashMap<K,V> m) {}}Copy the code

Member variables in a HashMap and their meanings

// Default initialization capacity initialization =16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// Maximum capacity = 1 << 30
static final int MAXIMUM_CAPACITY = 1 << 30;

// Default load factor. The critical point for expanding a HashMap is the size of the current HashMap > DEFAULT_LOAD_FACTOR *
//DEFAULT_INITIAL_CAPACITY = 0.75F * 16
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

// The default is an empty table array
static finalEntry<? ,? >[] EMPTY_TABLE = {};// Table [] defaults to EMPTY_TABLE, so put must be resized to a power of 2
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

// The actual number of elements in the map! = table.length
transient int size;

// If size is greater than or equal to the capacity expansion threshold, the resize operation is performed
// Generally threshold=capacity*loadFactor
int threshold;

// Load factor of hashTable
final float loadFactor;

/** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */
transient int modCount;

// The hashSeed is used to calculate the hash value of the key. It performs bitwise XOR with the key's hashCode
//hashSeed is an instance-dependent random value used to resolve hash conflicts
// If 0, the alternate hash algorithm is disabled
transient int hashSeed = 0;
Copy the code

The HashMap constructor

Let’s look at the four constructors provided for us in the HashMap source code.

//(1) constructor with no parameters:
// Construct an empty table with DEFAULT_INITIAL_CAPACITY=16. The load factor is DEFAULT_LOAD_FACTOR= 0.75f
public HashMap(a) {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
Copy the code

//(2) Specify the constructor that initializes the capacity
// Construct an empty table where the capacity is initialized with the passed parameter initialCapacity. The load factor is DEFAULT_LOAD_FACTOR= 0.75f
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
Copy the code

//(3) Specify the constructor to initialize the capacity and load factor
// Construct an empty table with an initialCapacity parameter and a loadFactor of loadFactor
public HashMap(int initialCapacity, float loadFactor) {
    // Check the validity of the initialization parameters passed in, and throw an exception if <0
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                           initialCapacity);
    // If initialCapacity is greater than the maximum capacity, then capacity =MAXIMUM_CAPACITY
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    // Check the validity of the input loading factor parameter,
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        //<0 or a value not of type Float, throw an exception
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
	// Assign a value to the attributes of the map instance
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
    // Init is an empty method, template method, if there are subclasses that need to be extended can be implemented
    init();
}
Copy the code

From the above three constructors, we can see that even though the initial capacity size is specified, the table is still empty, an empty array, and the expansion threshold is either the given capacity or the default capacity (the first two constructors are actually done by calling the third one). Before its PUT operation, the array is created (similar to jdK8 with no-parameter construction).

//(4) The parameter is a set of maps
// Construct a new map with the default load factor and the capacity as the parameter map size divided by the maximum value of the default load factor +1 and the default capacity
public HashMap(Map<? extends K, ? extends V> m) {
    // Capacity: map.size()/0.75+1 and 16 the larger of the two
    this(Math.max(
        	(int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                  DEFAULT_INITIAL_CAPACITY), 
         DEFAULT_LOAD_FACTOR);
    inflateTable(threshold);
    // Put all elements from the passed map into the currently constructed HashMap
    putAllForCreate(m);
}
Copy the code

This constructor calls the inflateTable method before the put operation, and what this method does is create a new table that you can then use putAllForCreate to load the elements in the incoming map. Let’s look at this method. Note that the threshold is the initial capacity. Some of these methods are described below

(1)inflateTable

This method is important and is called in the fourth constructor. If the first three constructors are used to create the collection object, this method will be called when the put method is called to initialize the table

private void inflateTable(int toSize) {
    // Return no less than the smallest power of 2 of number, with a maximum of MAXIMUM_CAPACITY, similar to tabSizeFor in the JDk8 implementation
    int capacity = roundUpToPowerOf2(toSize);
	// The capacity expansion threshold is the smaller of (capacity x loading factor) and (maximum capacity +1)
    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    // Create the table array
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}
Copy the code

(2)roundUpToPowerOf description

private static int roundUpToPowerOf2(int number) {
    //number >= 0;
    //(1)number >= maximum capacity: return maximum capacity
    //(2)0 =< number <= 1: returns 1
    //(3)1 < number < maximum capacity:
    return number >= MAXIMUM_CAPACITY
        ? MAXIMUM_CAPACITY
        : (number > 1)? Integer.highestOneBit((number -1) < <1) : 1;
}
// This method is similar to the tabSizeFor implementation in jdk8
public static int highestOneBit(int i) {
    // Since I >0 is passed in, the high order of I is still 0, so using the >> operator is equivalent to >>>, with the high order of 0.
    // For example, suppose I =5=0101
    i |= (i >>  1); // (1) I >>1=0010; (2) I = 0101 | 0010 = 0111
    i |= (i >>  2); // (1) I >>2=0011; (2) I = 0111 | 0011 = 0111
    i |= (i >>  4); // (1) I >>4=0; (2) I = 0111 | 0000 = 0111
    i |= (i >>  8); // (1) I >>8=0; (2) I = 0111 | 0000 = 0111
    i |= (i >> 16); // (1) I >>16=0000; (2) I = 0111 | 0000 = 0111
    return i - (i >>> 1); // (1) 0111>>>1=0011 (2) 0111-0011=0100=4
    // So return 4 here.
    // In roundUpToPowerOf2, the return value of highestOneBit is <<1, that is, the final result is 4<<1=8. Returns the smallest power of two greater than number
}
Copy the code

(3)putAllForCreate method description

This method iterates over the elements in the passed map set and adds them to the map instance. Let’s look at the implementation details of this method

private void putAllForCreate(Map<? extends K, ? extends V> m) {
    PutForCreate () putForCreate () putForCreate () putForCreate ()
    for (Map.Entry<? extends K, ? extends V> e : m.entrySet())
        putForCreate(e.getKey(), e.getValue());
}
Copy the code

PutForCreate method implementation principle

private void putForCreate(K key, V value) {
    // Check whether the key is null. If it is null, the hash is 0. Otherwise, call the hash() method to calculate the hash value
    int hash = null == key ? 0 : hash(key); 
    // Calculate the index in the table array based on the hash value just calculated
    int i = indexFor(hash, table.length);
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        // Replace the new value with the old one
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
            e.value = value;
            return; }}// Add a new node to the list because the key of the element to be inserted is different from the key in the previous list
    createEntry(hash, key, value, i);
}
Copy the code

(4) Implementation of createEntry method

void createEntry(int hash, K key, V value, int bucketIndex) {
    Call this method to create a new node in the bucket where the node is located
    // The subscript of the bucket is specified
    Entry<K,V> e = table[bucketIndex];
    /*Entry(int h, K k, V v, Entry
      
        n) {value = v; next = n; key = k; hash = h; } * /
      ,v>
    table[bucketIndex] = new Entry<>(hash, key, value, e);// The Entry constructor creates a new node as the head node.
    size++;// Add 1 to the number in the current hash table
}
Copy the code

How do you determine the location of an element in a bucket

The algorithm used to calculate the hash value in 1.7 is different from the implementation of 1.8, and the hash value is related to the position of the new element when we put it, find the element when we get it, and find the subscript when we remove it. So let’s look at these two methods

(1) the hash method

final int hash(Object k) {
    int h = hashSeed;
    // The default is 0, not 0, so we need a String key to use stringHash32
    if (0! = h && kinstanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }
    // This code is used to perturbate the key's hashCode to prevent hash collisions caused by different hashcodes with different high points but the same low points. Simple point
    // In order to reduce the probability of hash collisions by combining the high-order features with the low-order features, that is, to try to make every bit of change be correct
    // The final result has an impact
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Let’s use the following example to illustrate the importance of perturbation for key hashCode. Now we want to put a key-value pair in a map, and the key Value is “fsmly”. Without any perturbation knowledge, simply after obtaining hashCode, The resulting value is “0000_0000_0011_0110_0100_0100_1001_0010”. If the table array in the current map is 16 in length, the resulting index value is 10. Because of 15 to the 32-bit binary extended to “00000000000000000000000000001111”, so, a number in the bitwise and operator and his, before 28 whatever it is, the calculation results are the same (because 0 and do any number, the result is 0, In this case, a put Entry node is too dependent on the low hashCode value of the key, and the probability of collision is greatly increased. See the figure below

Since the map array length is limited, the method with a high probability of collision is not suitable, so the hashCode needs to be disturbed to reduce the probability of collision, and the JDK7 uses the fourth bit operation for this process. Let’s see the process through the following simple example. As you can see, the hashCode you just left undisturbed no longer generates hash collisions after processing.

To summarize: We will first compute the hash value of the passed key and then determine its position in the table by using the indexFor method below. The implementation is to perform a bit operation on the calculated hash value and the length-1, so that for 2^n, the length minus one is converted to binary and the low order is all one (length 16, Len -1=15, 1111 in binary). The advantage of the above four perturbations is that each bit of the resulting hashCode will affect the location of our index, so that the data can be better hashed into different buckets and hash conflicts can be reduced. For more on the rationale and details of the existence of hash methods in Java collections, see this hash() method analysis

(2) indexFor method

static int indexFor(int h, int length) {
    // Again use hash & (n-1) to compute the index
    return h & (length-1);
}
Copy the code

The main implementation is to perform bitwise and operation on the hash value of the calculated key and the array length of leng-1 in the map to obtain the array subscript of the Entry put in the table. There is also an example of the specific calculation process in the hash method introduced above, which will not be described here.

Analysis of PUT method

(1) put method

public V put(K key, V value) {
    // We know that the Hash Map has four constructors. Only one (Map) initializes the table array
    // The map object is assigned a threshold and a load factor, so the map object is created using the three constructors, and the table is {} when the put method is called.
    // There are no elements in the table, so we need to initialize the table
    if (table == EMPTY_TABLE) {
        InflateTable = inflateTable; inflateTable = inflateTable; inflateTable = inflateTable;
        // Not less than the minimum power of 2 of threshold, and the maximum is MAXIMUM_CAPACITY
        inflateTable(threshold);
    }
    // If the key is NULL, a k-V pair with null key is inserted. The putForNullKey method needs to be called
    if (key == null)
        return putForNullKey(value);
    
    // Compute the hash value of the key passed by PUT
    int hash = hash(key);
    // Calculate the subscript based on the hash value and the length of the table
    int i = indexFor(hash, table.length);
    // start the array with indexFor(hash, table.length) (1.7 used linked lists to resolve hash conflicts, here
    // is to traverse the list), in fact, has been located at the index I, in this case need to deal with the hash conflict problem
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        // Replace oldValue with value for the same hash and key
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            // Let a subclass override an empty method
            e.recordAccess(this);
            returnoldValue; }}Table [I] table[I] table[I] table[I] table[I] table[I
    
    // Modify the modCount value (more on this in a later summary)
    modCount++;
    // If the key is not found, the method is called to add a new node
    addEntry(hash, key, value, i);
    return null;
}
Copy the code

(2)putForNullKey method analysis

Select * from table[0] where key is null, select * from table[0] where key is null, select * from table[0] where key is null, select * from table[0] where key is null, select * from table[0] where key is null, select * from table[0] where key is null So just replace the old value with the value that was passed in. Otherwise create a new node and join it at table[0].

// Find the Entry object whose key is null in the table array and replace its value
private V putForNullKey(V value) {
    // Iterate from table[0]
    for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
        / / key to null
        if (e.key == null) {
            // Replace value with the value passed in
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue; // Return the old value
        }
    }
    modCount++;
    // If not, add a new node to the list on the bucket at position 0
    addEntry(0.null, value, 0);
    return null;
}
Copy the code

(3) Analysis of addEntry method

The addEntry method is used to determine whether the current size is greater than the threshold, and then to determine whether to expand the size of the list. Finally, a new node is created and inserted into the head of the list (actually at the specified subindex in the table array).

/* If you have a single linked list, you have to start from the beginning. * if you have a single linked list, you have to start from the beginning. Suddenly I think why not use the form of two-dimensional array using linear probe method to deal with conflicts, array insert is also O(1), but the biggest drawback of array is that if not inserted at the end of the deletion efficiency is very low, and secondly, if the added data is evenly distributed then each bucket on the array need to reserve memory. */
void addEntry(int hash, K key, V value, int bucketIndex) {
    // There are two conditions
    //① Whether the size is greater than the threshold
    //② The current subscript in the table is not null
    if ((size >= threshold) && (null! = table[bucketIndex])) {// If the value exceeds the threshold, expand the capacity
        resize(2 * table.length);
        // The following is the operation after the expansion
        // Computes the hash value of a key that is not null
        hash = (null! = key) ? hash(key) :0;
        // Calculate the index based on the hash
        bucketIndex = indexFor(hash, table.length);
    }
    // Create a new Entry node (which may or may not be expanded
    createEntry(hash, key, value, bucketIndex);
}
Copy the code

(4) Summarize the execution process of put method

First, check whether the array is empty. If the air conditioner is inflateTable, use inflateTable to expand the capacity.
If the key is null, call putForNullKey () to put it. (If the key is null, call putForNullKey)
The hash() method is called to hash the key once, and the hash value is evaluated against the current array length to get the index in the array
It then iterates over the list under the index of the array, overriding value if the hash of the key is the same as the hash of the passed key and the equals of the key is returned to true
Finally, if it does not exist, a new node is created by inserting the header into the list

Resize analysis

(1) The general process of REsize

void resize(int newCapacity) {
    // Retrieve the old table array from the map and store it
    Entry[] oldTable = table;
    // Retrieve the length of the original table array and store it
    int oldCapacity = oldTable.length;
    If the capacity of the original table exceeds the maximum value, set the threshold to the maximum value
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }
	// Create a new array with the size of the new hash table as the size of the new capacity
    Entry[] newTable = new Entry[newCapacity];
    / / call transfer
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    // Table points to the new array
    table = newTable;
    // Update the threshold
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code

(2) Analysis of transfer method

The transfer method traverses all entries in the old array and recalculates index headers one by one according to the new capacity and saves them in the new array.

void transfer(Entry[] newTable, boolean rehash) {
    // The length of the new array
    int newCapacity = newTable.length;
    // Iterate over the old array
    for (Entry<K,V> e : table) {
        while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                // Recalculate the hash value
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            // Here we call the indexFor method again to calculate the subindex based on the new hash we just obtained
            int i = indexFor(e.hash, newCapacity);
            // Suppose the list structure is a->b->c; women
            // (1) when the node is the first node in the original list: e.ext =null; newTable[i]=e; e=e.next
            // (2) when traversing the next node in the original list: e.ext =head; NewTable [I]=e; e=e.next
            // (3) This is how the list is reversed after expansion. (3) This is how the list is reversed, provided that the new index is the same.e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

The main part of this method is to understand the difference between the structure of the original list and that of the new table by looking at the following simple diagram, If entry1->entry2->entry3 is a linked list at position 4 in the original table, and the subscript calculation of the three nodes in the new array is still 4, the process is roughly as shown in the figure below

(3) ReSIZE expansion method summary

Create a new array (twice as long as the original, or set to the maximum if it has already exceeded the maximum)
The Transfer method is called to move the entry from the old table to the new array, as shown above
Update the threshold by pointing the table to the new table

Analysis of GET method

/ / the get method, which is called the getEntry method without if not null returns the value of the corresponding entry
public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);
    return null == entry ? null : entry.getValue();
}
Copy the code

As you can see, the get method calls getEntry to query the Entry object and then returns the value of the Entry. So let’s look at the implementation of the getEntry method

The getEntry method

// This is the implementation of getEntry
final Entry<K,V> getEntry(Object key) {
    // Null is naturally returned for no element
    if (size == 0) {
        return null;
    }
	// Call the hash method to calculate the hash value from the key value passed in
    int hash = (key == null)?0 : hash(key);
    // After calculating the index, iterates through the corresponding linked list to find the Entry
    for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
         e = e.next) {
        Object k;
        // If the hash is the same, the key is the same
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

GetForNullKey method

// This method looks directly for a null key
private V getForNullKey(a) {
    if (size == 0) {
        return null;
    }
    // Start the search directly from the list at subscript 0 (with only one null key)
    for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
        // If the key is null, return the corresponding value directly
        if (e.key == null)
            return e.value;
    }
    return null;
}
Copy the code

A brief summary of the implementation of jdk1.7

(1) Since the PUT operation handles the putForNullKey method separately for the scenario where the key is null, HashMap allows null as the key

(2) When calculating the subscript of the table, the hash() method is called according to the hashcode value of the key, and the hash value is obtained and the array length-1 is &. The binary bits of length-1 are all 1, which is for the purpose of uniform distribution. (3) Both get, PUT, and resize will hash the hashcode of the key. Mutable objects can easily change their Hashcode. (4)HashMap is not thread safe. When expanding in a multi-threaded environment, it may lead to an infinite loop of a linked list. (5) When conflicts occur, HashMap uses the chain address method to handle the conflicts. (6) The initial capacity of HashMap is set as 16, and the expansion threshold is set as 6 if it is simply considered as 8. The threshold is too small, leading to frequent capacity expansion. And 32 might be less space efficient.

Jdk7 in the case of concurrent circular linked list problem diagram

When we talked about the resize method above, we also illustrated the resize process with an example, so we won’t show the single-threaded execution process here. Let’s start by remembering a few lines of core code in the resize method

Entry<K,V> next = e.next;
// Omit the process of recalculating hash and index...
e.next = newTable[i]; 
newTable[i] = e;
e = next;
Copy the code

The main lines of the transfer method called in the resize method are the above four lines. Let’s briefly simulate the resize process assuming two threads, Thread1 and Thread2.

(1) Before resize, assume that the table length is 2, and now add entry4, you need to expand

(2) Suppose thread1 executes **Entry<K,V> next = e.ext; ** at this line of code, let’s make a quick comment based on the above lines

(3) Since thread scheduling turns to Execution of Thread2, assuming that Thread2 completes the transfer method (assuming that entry3 and Entry4 arrive at the location shown in the following figure after capacity expansion, here we mainly focus on entry1 and Entry2), then the result is

(4) At this point thread1 is scheduled to continue execution, insert entry1 into the new array, then e is Entry2, and next becomes entry1 due to Thread2’s operation on the next loop

NewTalbe [I] = e; ** When thread1 is executed, e points to entry1
And then e = next, so e points to entry2 (Next points to entry2)
Next loop next= e.ext, (that is, next=entry2.next=entry1 as a result of thread2 execution) causes next to point to entry1

See the figure below

(5) Thread1 continues execution by removing entry2 and placing it in the first position of the bucket newTable[1], then moving E and next

(6) e.ext = newTable[1] (entry1. Next) At this point, entry2.next already points to entry1(the result of thread2 execution is entry2->entry1, see the diagram above), and the circular list appears.