Introduction to HashMap
The HashMap is located in the java.util directory of rt.jar, which comes with the JDK. A HashMap is a hash table that stores a mapping of key-value pairs <key,value>.
AbstractMap, AbstractMap, AbstractMap, Cloneable, Serializable interface HashMap is thread unsafe, where key and value can be null and unordered.
2. Data structure of HashMap
The underlying implementation of HashMap is mainly based on arrays and linked lists, and its query speed is relatively fast mainly because it calculates hash codes to determine the location of storage. In a HashMap, the key’s hashCode is used to calculate the hash value. If the hashCode is the same, the calculated hash value is the same. If there are too many pairs of objects, it is possible that different objects will have the same hash value, which is called hash conflict. Those of you who have studied data structures know that there are many ways to resolve hash conflicts. At the bottom of a HashMap, hash conflicts are resolved through linked lists.
Each element of the array is the head node of a single linked list. Linked lists are used to resolve conflicts. If different keys map to the same location in the array, they are put into the single linked list.
Let’s look at the code for the Entry class in HashMap:
A HashMap is an array of entries containing keys and values. Next is also an Entry object, which is used to handle hash conflicts and form a linked list.
Three, HashMap source code analysis
1. Key attributes
Let’s take a look at some key attributes in the HashMap class:
transient Entry[] table; // An array of entities that store elements
transient int size; // The number of elements to store
int threshold; // Threshold When the actual size exceeds the threshold, the system expands the capacity. Threshold = Load factor x Capacity
final float loadFactor; // Load factor
transient int modCount; // The number of times it has been modified
Where loadFactor is the degree to which elements in the Hsah table are filled.
For example, the larger the loading factor, the more elements are filled. The advantage is that the space utilization is higher, but the chance of conflicts is increased. The list length will get longer and longer, and the search efficiency will decrease.
Conversely, the smaller the loading factor, the fewer elements are filled, which has the advantage of: less chance of conflict, but: more wasted space. The data in the table will be too sparse (a lot of space has been expanded before it is used)
The greater the chance of conflict, the higher the cost of lookup.
Therefore, a balance and compromise must be found between “opportunities for conflict” and “space utilization”. This balance and tradeoff is essentially the tradeoff of the famous time-space contradiction in data structures.
If the machine has enough memory and you want to speed up the query, you can set the load factor to be smaller. On the contrary, if the machine is short of memory and there is no requirement for query speed, you can set the load factor to be larger. However, we don’t need to set it. The default value is 0.75.
2. Construction method
Let’s look at a few constructors of a HashMap:
public HashMap(int initialCapacity, Float loadFactor) {if (initialCapacity < 0) throw new IllegalArgumentException("Illegal Initial Capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; While (capacity < initialCapacity) // Make sure the capacity is the NTH power of 2, so that capacity is the smallest power of 2 greater than initialCapacity capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)(capacity \* loadFactor); table = new Entry[capacity]; init(); } public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; threshold = (int)(DEFAULT_INITIAL_CAPACITY \* DEFAULT_LOAD_FACTOR); table = new Entry[DEFAULT_INITIAL_CAPACITY]; init(); }Copy the code
We can see that when constructing a HashMap the first constructor is called if we specify the loading factor and initial capacity, otherwise the default is used. The default initial capacity is 16 and the default load factor is 0.75. We can see lines 13-15 in the code above. This code ensures that capacity is to the NTH power of 2, making capacity the smallest power of 2 greater than initialCapacity. We will see why capacity is set to the NTH power of 2 later.
Let’s focus on the two most commonly used methods in HashMap: PUT and get
3. Store data
Let’s take a look at how HashMap stores data, starting with the HashMap PUT method:
The above program uses an important internal interface: map. Entry. Each Map.Entry is essentially a key-value pair. As can be seen from the above program, when the system decides to store key-value pairs in the HashMap, it does not consider the value in the Entry at all, but only calculates and determines the storage location of each Entry according to the key. This illustrates the previous conclusion: we can treat a value in a Map set as an accessory to a key, and when the system decides where the key will be stored, the value will be stored there.
PutForNullKey (value); putForNullKey(value); putForNullKey(value); putForNullKey(value)
Note: If the key is null, the hash value is 0 and the object is stored at index 0 in the array. The table [0]
Let’s go back to line 4 of the put method, which evaluates the hash code from the key’s hashCode value. Here’s the function that evaluates the hash code:
// Static int hash(int h) {// This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Copy the code
The hash code is used to calculate the index that should be stored in the array.
Record on pit
Pit: Stores characters as keys and occurrences as values in the HashMap, so that when you need to find a character that only occurs once, you only need to find the character whose first occurrence is 1. I used the iterator of the keySet of the HashMap to iterate over it, and the result was correct until GOOgl was typed, but returned e after e was typed. The code is as follows:
The iterator that uses the hashmap cannot find the first character of degree 1. The iterator that uses the hashmap cannot find the first character of degree 1. The iterator that uses the hashmap cannot find the first character of degree 1. Only characters whose first occurrence in the inner array is 1 can be found. Add an array of characters in order, iterate through the array, and get the value of the array as key.