Writing in the front
HashMap is one of the most commonly used maps and an important member of the Java Collection Framework. This paper first gives the essence of HashMap and summarizes its relationship with Map and HashSet, and then gives the definition of HashMap in JDK, and analyzes its four construction methods combined with source code. Finally, through the analysis of the data structure, implementation principle and source code implementation of HashMap, the underlying Hash storage mechanism is deeply explored, the reason why the length of the underlying array is always 2 to the power of n is explained, and the principle and implementation of its fast access, expansion and rehash after expansion are revealed.
HashMap in Depth: This article gives you a thorough look at HashMap
All the source code for HashMap in this article is based on JDK 1.6**. There may be some differences between different JDK versions, but it does not affect our overall understanding of the data structure and principle of HashMap.
A HashMap overview
A Map is an abstract interface for key-value mappings that do not contain duplicate keys, that is, one Key corresponds to one Value. HashMap is an important member of the Java Collection Framework and one of the most commonly used of the Map family (shown below). Simply put, a HashMap is an implementation of the Map interface based on a hash table, in the form of key-value, that is, the stored object is an Entry (containing both a Key and a Value). In a HashMap, it calculates the location of a key-value based on the hash algorithm and accesses it quickly. In particular, a HashMap allows at most one Entry to have a Null key (multiple entries will be overwritten), but allows multiple entries to have Null values. In addition, a HashMap is an asynchronous implementation of a Map.
HashMap in Depth: This article gives you a thorough look at HashMap
Although HashMap and HashSet implementations have different interface specifications, their underlying Hash storage mechanism is identical. In fact, the HashSet itself is implemented on top of the HashMap
Definition of HashMap in JDK
HashMap implements the Map interface and inherits the AbstractMap abstract class, which defines key-value mapping rules. Similar to the AbstractCollection abstract class in the Collection family, the AbstractMap abstract class provides a backbone implementation of the Map interface to minimize the effort required to implement the Map interface. HashMap is defined in the JDK as:
public class HashMap<K, V> extends AbstractMap<K, V>
implements Map<K, V>, Cloneable, Serializable {
...
}
Copy the code
Constructor of the HashMap
There are four constructors provided by HashMap. The constructor that takes no arguments by default and the constructor that takes Map are recommended implementations of the Java Collection Framework specification. The other two constructors are provided by HashMap.
1, a HashMap ()
This constructor is intended to construct an empty HashMap with > default initial capacity (16) and default load factor (0.75). It is recommended by the Java Collection Framework specification and the source code is as follows:
/** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */ Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75) HashMap() {// loadFactor: used to measure how much space is used in a hash table this.loadFactor = DEFAULT_LOAD_FACTOR; //HashMap capacity expansion threshold. Its value is equal to the capacity of HashMap multiplied by the load factor threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); Table = new Entry[DEFAULT_INITIAL_CAPACITY]; init(); }Copy the code
2, HashMap(int initialCapacity, float loadFactor)
This constructor is intended to construct an empty HashMap that specifies an initial capacity and a specified load factor. The source code is as follows:
/** * Constructs an empty HashMap with the specified initial capacity and load factor. */ public HashMap(int initialCapacity, Float loadFactor) {if (initialCapacity < 0) throw new IllegalArgumentException("Illegal Initial Capacity: " + initialCapacity); If (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; / / load factor not less than 0 if (loadFactor < = 0 | | Float. The isNaN (loadFactor)) throw new IllegalArgumentException (" Illegal load factor: " + loadFactor); // The capacity of the HashMap must be a power of 2, exceeding initialCapacity's minimum 2^n int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; // loadFactor this.loadFactor = loadFactor; Threshold = (int)(capacity * loadFactor); // Set the capacity limit of the HashMap. When the capacity of the HashMap reaches this limit, the system automatically expands the capacity. Table = new Entry[capacity]; init(); }Copy the code
3, a HashMap (int initialCapacity)
This constructor is intended to construct an empty HashMap that specifies the initial capacity and default load factor (0.75). The source code is as follows:
// Constructs an empty HashMap with the specified initial capacity and the default load factor (0.75) public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); // Call the above constructor directly}Copy the code
4, a HashMap (Map <? extends K, ? extends V> m)
This constructor is intended to construct a HashMap that has the same mapping as the specified Map, with an initial capacity of at least 16 (depending on the size of the specified Map) and a load factor of 0.75, as recommended by the Java Collection Framework specification.
// Constructs a new HashMap with the same mappings as the specified Map. // The HashMap is created with default load Factor (0.75) and an initial capacity // sufficient to hold the mappings in the specified Map. Public HashMap(Map<? extends K, ? Extends V> m) {this(math.max ((int) (m.size()/DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR); putAllForCreate(m); }Copy the code
Here, we mention two very important parameters: initial capacity and load factor, which are important parameters that affect the performance of a HashMap. Capacity refers to the number of buckets in the hash table (the size of the table array). The initial capacity is the number of buckets when the hash table is created. Load factor is a measure of how full a hash table can be before its capacity is automatically increased. It measures how much space is used in a hash table. A larger load factor means that the hash table is more loaded, and vice versa.
The data structure of the HashMap
Related concepts of hashing
A Hash is a Hash algorithm that transforms an input of any length (also known as a pre-image, or pre-image) into a fixed-length output (usually an integer), which is the Hash value. This transformation is a compression mapping, that is, the space of the hash value is usually much smaller than the space of the input. Different inputs may be hashed into the same output, making it impossible to determine the input value uniquely from the hash value. Simply put, it is a kind of information digest function that compresses the message of arbitrary length to a certain fixed length.
Applications of hashing: data structures
As we know, arrays are easy to address, difficult to insert and delete; The characteristics of linked lists are: difficult to address, easy to insert and delete. So can we combine the properties of both to create a data structure that is easy to address, easy to insert and easy to delete? And the answer is yes, that’s the hash table we’re talking about. In fact, there are many different ways to implement a hash table. The most classic one, the zipper method, can be thought of as an array of linked lists, as shown below:
HashMap in Depth: This article gives you a thorough look at HashMap
As we can see from the figure above, the left-hand side is clearly an array, and each member of the array is a linked list. All elements contained in the data structure contain a pointer for linking between elements. We assign elements to linked lists based on their characteristics, and in turn we find the correct list and the correct element from the linked list. Among them, the method of calculating the subscript of element array according to element characteristics is hashing algorithm.
In general, hash tables are suitable for use as basic data structures for quick lookups and deletions, usually requiring a total amount of data to fit into memory. When using a hash table, there are several key points:
- Selection of hash functions: specific hash methods for different objects (strings, integers, etc.);
- Collision processing: commonly used in two ways, one is open hashing, namely > zipper method;
- The other is closed hashing, which is the Opened Addressing.
The data structure of the HashMap
As we know, the two most commonly used structures in Java are arrays and linked lists. Almost all data structures can be combined with these two structures, and HashMap is a typical example of this application. In fact, a HashMap is a linked list array, and its data structure is as follows:
HashMap in Depth: This article gives you a thorough look at HashMap
From the figure above, we can see visually that the underlying implementation of HashMap is still an array, except that each item of the array is a chain. The parameter initialCapacity represents the length of the array, which is the number of buckets. In section 3 we looked at the source of the default constructor for HashMap:
/** * Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75). */ Constructs an empty HashMap with the default initial capacity * (16) and the default load factor (0.75) HashMap() {// loadFactor: used to measure how much space is used in a hash table this.loadFactor = DEFAULT_LOAD_FACTOR; //HashMap capacity expansion threshold. Its value is equal to the capacity of HashMap multiplied by the load factor threshold = (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR); Table = new Entry[DEFAULT_INITIAL_CAPACITY]; init(); }Copy the code
Each time a HashMap is created, it initializes a table array of Entry type. The Entry type is defined as follows:
static class Entry<K,V> implements Map.Entry<K,V> { final K key; // V value; // The value of the key-value pair Entry<K,V> next; // Next node final int hash; /** * Creates new entry. */ entry (int h, K K, V V, Entry<K,V> n) {// Entry's constructor value = V; next = n; key = k; hash = h; }... }Copy the code
Entry is the internal class of HashMap, realizing the map. Entry interface, which contains four attributes: key, value, next node, and hash value. In fact, Entry is the building block of a hash table, the concrete form of the elements that the hash table stores.
Quick access to HashMap
Let’s look at the implementation of HashMap access using JDK source code.
Storage implementation of HashMap
In a HashMap, key-value pairs are stored using the put(key,vlaue) method. The source code is as follows:
/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with key, or null if there was no mapping for key. * Note that a null return can also indicate that the map previously associated */ public V put(K key, V value) {// putForNullKey is putForNullKey. If (key == NULL) return putForNullKey(value); Int hash = hash(key.hashcode ()); // ------- (1) // Calculate the storage position of the key-value pair in the array (which bucket) int I = indexFor(hash, table.length); For (Entry<K,V> e = table[I]; for (Entry<K,V> e = table[I]; e ! = null; e = e.next) { // ------- (3) Object k; // Determine if there is a mapping on the chain with the same hash value and the same key value. If there is, overwrite value directly. And returns the old value if (e.h ash = = hash && ((k = e.k ey) = = key | | key. Equals (k))) {V oldValue = e.v alue. e.value = value; e.recordAccess(this); return oldValue; // return the old value}} modCount++; // This map does not exist in the original HashMap. Add this map to the header addEntry(hash, key, value, I) of the chain. return null; }Copy the code
Special handling of NULL keys: putForNullKey()
Let’s look directly at the source code:
/** * Offloaded version of put for null keys */ private V putForNullKey(V value) { Table [0] for (Entry<K,V> e = table[0]; e ! = null; E = e.next) {if (e.key == null) {// if there is already a key with a null key, replace its value and return the oldValue V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } modCount++; AddEntry (0, null, value, 0); // Otherwise, add it to the bucket of table[0] return null; }Copy the code
Hash Strategy in HashMap (algorithm)
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits.
*
* Note: Null keys always map to hash 0, thus index 0.
*/
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code
As the JDK officially describes this method, the use of the hash() method to recalulate an object’s hashCode is intended to prevent a poor-quality implementation of the hashCode() function. Since the length of the supporting array of a hashMap is always a power of 2, the right shift can make the lower-order data as different as possible, thus making the hash value as evenly distributed as possible.
After calculating the hash value of the Key using the hash() method above, how can we ensure that the elements are evenly distributed in each bucket of the table? HashMap (indexFor(), indexFor()));
/** * * Returns index for hash code h. * */ static int indexFor( int h, int length ) { return(h & (length - 1) ); /* Function is equivalent to modulo operation, but this way is more efficient */}Copy the code
Add key-value pairs to HashMap: addEntry()
Let’s look directly at the source code:
/** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if Appropriate. * * Subclass overrides this to alter the behavior of put method. * * always add a new element to the head of the list */ void addEntry(int Hash, K key, V value, int bucketIndex) {// Get the linked list Entry at bucketIndex <K,V> e = table[bucketIndex]; Table [bucketIndex] = new Entry<K,V>(hash, key, value, e); If (size++ >= threshold) resize(2 * table.length); }Copy the code
HashMap expansion: resize()
As the number of elements in a HashMap increases, the probability of collisions will increase and the length of subchains generated will become longer and longer, which is bound to affect the access speed of HashMap. ** To ensure the efficiency of HashMap, the system must expand at a critical point, where the number of elements in HashMap is equal to threshold (table array length loading factor). I have to say, however, that scaling is a time-consuming process, because it requires recalculating the positions of these elements in the new TABLE array and copying them. Therefore, if we can predict the number of elements in a HashMap in advance, the preset number of elements when constructing a HashMap can effectively improve the performance of the HashMap *. Let’s look directly at the source code:
/** * * Rehashes the contents of this map into a new array with a * * larger capacity. This method is called automatically when the * * number of keys in this map reaches its threshold. * * * * If current capacity is MAXIMUM_CAPACITY, this method does not * * resize the map, but sets threshold to Integer.MAX_VALUE. * * This has the effect of preventing future calls. * * * * @param newCapacity the new capacity, MUST be a power of two; * * must be greater than current capacity unless current * * capacity is MAXIMUM_CAPACITY (in which case value * * is irrelevant). * */ void resize( int newCapacity ) { Entry[] oldTable = table; int oldCapacity = oldTable.length; /* If oldCapacity has reached the maximum value, Set threshold to integer. MAX_VALUE */ if (oldCapacity == MAXIMUM_CAPACITY) {threshold = integer.max_value; return; /* Returns */} /* Otherwise, create a larger array */ Entry[] newTable = new Entry[newCapacity]; /* Rehash each Entry into a new array */ transfer(newTable); table = newTable; threshold = (int) (newCapacity * loadFactor); /* Reset threshold */}Copy the code
HashMap rehash: Transfer ()
Rehash is a process of recalculating the position of the elements in the original HashMap in the new table array and copying them.
/** ** Transfers all entries from current table to newTable. ** / void transfer(Entry[] newTable) {/* Transfers all entries from current table to newTable Assign an array SRC */ Entry[] SRC = table; int newCapacity = newTable.length; /* Add each chain from array SRC to newTable */ for (int j = 0; j < src.length; j++ ) { Entry<K, V> e = src[j]; if ( e ! = null ) { src[j] = null; /* SRC reclaim */ /* add each element of each chain in turn to the corresponding bucket in newTable */ do {Entry<K, V> next = e.next; /* e.hash refers to the return value of hash(key.hashcode ()); */ int I = indexFor(e.hash, newCapacity); */ int I = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } while ( e ! = null ); }}}Copy the code
In particular, in the process of rehashing, Entry objects belonging to one bucket may be divided into different buckets. Because the capacity of HashMap changes, the value of H &(Length-1) will also change accordingly. In extreme cases, rehashing is meaningless if Entry objects belonging to the same bucket still belong to the same bucket.
Read implementation of HashMap
Reading a HashMap is relatively simple compared to storing it. A HashMap uses the hash value of the key to locate a specific bucket in the table array, and then returns the value corresponding to the key.
/** * Returns the value to which the specified key is mapped, * or {@code null} if this map contains no mapping for the key. * <p>More formally, if this map contains a mapping from a key * {@code k} to a value {@code v} such that {@code (key==null ? k==null : * key.equals(k))}, then this method returns {@code v}; otherwise * it returns {@code null}. (There can be at most one such mapping.) * <p>A return value of {@code null} does not <i>necessarily</i> * indicate that the map contains no mapping for the key; it's also * possible that the map explicitly maps the key to {@code null}. * The {@link #containsKey containsKey} operation may be used to * distinguish these two cases. * @see #put(Object, Object) */ public V get(Object key) { Call getForNullKey to return the corresponding value */ if (key == NULL) /* Find the null mapping from the first bucket of the table; Return (getForNullKey()); */ int hash = hash(key.hashcode ()); */ for (Entry<K, V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next ) { Object k; / * if the search key and find the key of the same, it returns the corresponding value * / if (e.h ash = = hash && ((k = e.k ey) = = key | | key. Equals (k))) return (e.v alue); } return(null); }Copy the code
For key-value pairs with NULL keys, HashMap provides a special handler: getForNullKey().
/** * * Offloaded version of get() to look up null keys. Null keys map * * to index 0\. This null case is split out into separate methods * * for the sake of performance in the two most commonly used * * operations (get and put), But incorporated with conditionals in * * others. * */ private V getForNullKey() {/ */ for (Entry<K, V> e = table[0]; e ! = null; e = e.next ) { if ( e.key == null ) return(e.value); } /* Return NULL */ return(NULL) if a key/value pair does not exist; }Copy the code
Therefore, if the return value is NULL after the get(Object key) method of a HashMap is called, the following two possibilities exist:
- The value of this key is NULL.
- The key does not exist in the HashMap.