In a recent interview, I was asked about the principle of HashMap in Java. I was at a loss for words, so I decided to do some research.

HashCode and equals in Java

1. About hashCode

  1. HashCode exists mainly for quick lookup, such as Hashtable, HashMap, etc. HashCode is used to determine the storage address of an object in a hash storage structure
  2. If two objects are the same, that applies to equals(java.lang.object), then the hashCode of the two objects must be the same
  3. If the equals method of an object is overridden, then the hashCode of the object should be overridden as well, and the resulting hashCode must be the same as that used in the equals method, otherwise it will violate point 2 above
  4. The equals(java.lang.object) method does not necessarily apply to the equals(java.lang.object) method. The equals(java.lang.object) method does not necessarily apply to the equals(java.lang.object) method. The equals(java.lang.object) method does not necessarily apply to the equals(java.lang.object) method.

HashCode is used for lookups and equals is used to compare whether two objects are equal.

The following interpretation of hashCode is taken from other blogs:

1. Hashcode is for look-up, and if you’ve ever learned about data structures, in the Look-and-sort chapter there’s for example, in memory I have locations like 0, 1, 2, 3, 4, 5, 6, 7 and I have a class, and this class has a field called ID, and I’m going to put this class in one of these eight locations, If you store it arbitrarily without hashcode, then you have to look in each of these eight places, or use something like a dichotomy algorithm. But if you use Hashcode that’s a lot more efficient. We have a field in our class called ID, so we define our HashCode as ID % 8 and store our class where we get the remainder. Let’s say we have ID 9 and the remainder of 9 divided by 8 is 1, so let’s put this class at position 1, and if ID is 13 and the remainder is 5, let’s put this class at position 5. In this way, we can find the location of the class directly by dividing its ID by 8. For example, 9 divided by 8 and 17 divided by 8 both have a remainder of 1. Is this legal? So how do you tell? At this point you need to define equals. In other words, we use HashCode to determine if two classes are stored in a bucket, but the bucket may have many classes, so we need to use Equals to find the class we want in the bucket. Then. Why rewrite hashCode() when you override equals()? If you want to find something in a bucket, you have to find the bucket first. What’s the use of rewriting equals() if you don’t just rewrite hashCode () to find the bucket

2. Equals

1. Equals and == == have different functions for comparing references and base data types: true for base data types if two values are the same, true for references if the reference refers to the same object in memory;

Equals () is used as a method to compare objects. Since the == operator does not allow us to override, that is, it limits our expression. So we override the equals() method to compare whether the contents of the objects are the same. You can’t do this with the == operator.

2. The equals() method of the object class returns true if two objects are of the same type and have the same content: Java. IO. File, Java. Util. Date, Java. Lang. String, wrapper class (Integer, Double, etc.) string s1 = new string (” ABC “); String s2=new String(“abc”); System.out.println(s1==s2); System.out.println(s1.equals(s2)); The result is false true

2. Implementation principle of HashMap

1. A HashMap overview

HashMap is an asynchronous implementation of the Map interface based on hash tables. This implementation provides all optional mapping operations and allows null values and NULL keys. This class does not guarantee the order of the mapping, and in particular it does not guarantee that the order is constant.

In the Java programming language, there are two basic structures, one is an array, the other is a mock pointer (reference), all data structures can be constructed with these two basic structures, HashMap is no exception. A HashMap is actually a “linked list hash” data structure, which is a combination of arrays and lists.

As you can see from the figure above, the underlying HashMap is an array structure, and each item in the array is a linked list. When a HashMap is created, an array is initialized.

Java source code is as follows:

/** * The table, resized as necessary. Length MUST Always be a power of two. */ transient Entry[] table; static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; final int hash; ... }Copy the code

Each map. Entry is actually a key-value pair that holds a reference to the next element, which forms a linked list.

2. HashMap implements storage and reading

1) storage

Public V put(K key, V value) {// HashMap allows storing null keys and values. PutForNullKey (); putForNullKey (); putForNullKey (); putForNullKey (); if (key == null) return putForNullKey(value); // Recalculates the hash value based on the key's keyCode. int hash = hash(key.hashCode()); // Search for the index of the hash value in the table. int i = indexFor(hash, table.length); // If the Entry at index I is not null, loop through the next element of element e. for (Entry<K,V> e = table[i]; e ! = null; e = e.next) { Object k; If (e.h ash = = hash && ((k = e.k ey) = = key | | key. Equals (k))) {/ / if it is found that the existing keys, then store the new value, and return the original value V oldValue = e.v alue. e.value = value; e.recordAccess(this); return oldValue; }} // If Entry at index I is null, there is no Entry yet. modCount++; // add key and value to index I. addEntry(hash, key, value, i); return null; }Copy the code

Based on the hash value of the element’s position in the array (that is, the index), if there are already other elements at that position in the array, the elements at that position will be stored as a linked list, with the new ones at the head and the first ones at the end. If there is no element at that location in the array, the element is placed directly at that location in the array.

The hash(int h) method recalculates a hash based on the key’s hashCode. This algorithm adds high-order calculation to prevent hash conflicts caused by constant low-order and high-order changes.

 static int hash(int h) {
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
 }
Copy the code

We can see that to find an element in a HashMap, we need to calculate the position in the array based on the hash value of the key. How you compute that position is the hash algorithm. As mentioned above, the data structure of HashMap is the combination of array and linked list, so of course we want the positions of the elements in the HashMap to be as evenly distributed as possible, so that the number of elements in each position is only one, so when we use the hash algorithm to solve this position, Immediately, we can know that the element in the corresponding position is the one we want, and we don’t have to go through the list, which greatly optimizes the efficiency of the query.

When a program attempts to put a key-value pair into a HashMap, it first decides where to store the Entry based on the return value of the key’s hashCode() : If two Entry keys return the same hashCode() value, they are stored in the same location. If the keys of the two entries return true through equals comparison, the value of the new Entry overwrites the value of the original Entry in the collection, but the key does not. If the keys of the two entries return false through equals, the new Entry will form an Entry chain with the original Entry in the collection, And the new Entry is at the head of the Entry chain — see the addEntry() method for details.

In this way, HashMap conflicts can be effectively resolved.

2) read

public V get(Object key) { if (key == null) return getForNullKey(); int hash = hash(key.hashCode()); for (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k; if (e.hash == hash && ((k = e.key) == key || key.equals(k))) return e.value; } return null; }Copy the code

When we get elements from a HashMap, we first evaluate the hashCode of the key, find an element at the corresponding position in the array, and then use the key’s equals method to find the desired element in the linked list at the corresponding position.

3) In a nutshell, the underlying HashMap treats key-value as a whole, which is an Entry object. At the bottom of HashMap, an Entry[] array is used to store all key-value pairs. When an Entry object needs to be stored, its storage position in the array is determined by the Hash algorithm, and its storage position in the linked list is determined by the equals method. When an Entry needs to be retrieved, the hash algorithm is used to find its storage location in the array, and the equals method is used to retrieve the Entry from the linked list at that location.

3, HashMap resize

As the number of elements in a HashMap increases, the probability of collisions increases (because the array length is fixed), so to improve query efficiency, we need to expand the array of a HashMap. Array enlargement also occurs in an ArrayList, so this is a general operation. A lot of people have been skeptical about its performance, but think about the principle of “amortization.” After the hashMap array is expanded, the most performance costly point occurs: the data in the original array must be recalculated and placed in the new array, which is called resize.

So when will hashMap be expanded? Array expansion occurs when the number of elements in the hashMap exceeds the array size loadFactor. The default value of loadFactor is 0.75, that is, the default array size is 16. The array size is doubled to 216=32, and the position of each element in the array is recalculated. This is a very performance expensive operation, so if we already know the number of elements in the HashMap, the preset number of elements can effectively improve the performance of the HashMap. For example, we have 1000 elements new HashMap(1000), but theoretically new HashMap(1024) is more appropriate, but as Annegu said above, HashMap automatically sets it to 1024 even if it is 1000. New HashMap(2048) (1024) is not suitable because 0.751000 < 1000. Resize issues are also avoided.

Iii. Summary: Implementation principle of HashMap:

  1. Rehash the index of the current object’s element in the array using the key’s hashCode
  2. If keys with the same hash value are used during storage, there are two scenarios. (1) If the key is the same, the original value is overwritten; (2) If the key is different (conflict occurs), the current key-value will be put into the linked list
  3. When obtaining the hash value, it directly finds the subscript corresponding to the hash value and further checks whether the key is the same to find the corresponding value.
  4. After understanding the above process, it is not difficult to understand how HashMap solves the problem of hash conflicts. The core is to use the storage mode of array, and then put the objects of the conflicting keys into the linked list. Once the conflicts are found, further comparison will be made in the linked list.

Four, write to the end

You already know how to use HashMap in Java. Point attention, don’t get lost, pay attention to programmer Zeng Zeng, share different Java knowledge every day. For more Java basics, I put together a GitHub repository of my own:Java little White training manual, you can check it yourself if you need to