preface

Before we talk about what Hashcode () does, we need to understand what a hash table is. A Hash Table is a data structure called a Hash Table. It is also commonly referred to as a ‘Hash Table ‘. A hash table is a data structure that supports random access using array subscripts. It can also be understood as an extension of an array.

In a school sports meeting, there were 100 players. In order to record the results of each player, the players would be labeled with their own number in turn. Let’s say they’re numbered from 1 to 100. Now we need to program the way, how can we quickly get the results of the designated number of players?

Usually we build a one-dimensional array and place the player numbered 1 at index 0, the player numbered 2 at index 1, and so on, the player numbered 100 at index 99. When we want to get the player’s information numbered x, we just need to get the player’s information from the position marked Y in the array. The time complexity of this process is O(1), isn’t it very efficient?

Extension: Why can an array be O(1) time based on random access to array indices?

In fact, this example already contains the idea of hashing. In this example, the player code x and the array subscript y can form an expression:

Among them, the number of the player is called key or keyword. We use it to identify a player. Then the mapping method that converts the entry number to the array index is called a Hash function, and the value that the Hash function evaluates is called a Hash value.

From the above example, we can summarize the rule: hash table uses the array to support random access by index, time complexity O(1) property. We use the hash table function to map the elements’ keys to their subscripts, and then store the data at the corresponding subscripts in the array. When we need to query the elements of an array, we also need to map the key value of the element to the corresponding array subscript through the hash function, and then find the corresponding element through the array subscript.

How can HashMap query values efficiently

Before we talk about how HashMap can be used to efficiently query a Value, let’s review how HashMap can be used to query a given element in a linear table. If the length of the linear table is 1000, the worst time complexity is O(1000) if the search is sequential, and binary search is used. The worst time complexity also requires O(500).

In a HashMap, it depends a lot on the validity of the hash code. In the internal structure of HashMap, it can be regarded as a composite structure composed of array (Node<K,V>[] table) and linked list. And the data placed in the HashMap and the arrays in the internal structure of the HashMap are correlated by hash().

Using the above example, suppose a hash() expression of x-1 is converted to the mathematical formula y = f(x), where f(x) = x-1 (where x represents the contestant number and y represents the array index). When we know that the number of a player is 1, we just need to convert it through the corresponding hash() function, i.e. 1-1 = 0, and the resulting 0 is the index of the array, so we can quickly get the score information of the player with the number of 1. Similarly, if we need to store the score information of player numbered 50, we just need to substitute 50 into hash(), i.e. 50-1 = 49, and the resulting 49 is the location where player numbered 50 can be stored, as shown in the picture below:

Of course, in a real HashMap, hash() is not as simple as f(x) = x-1 above. As shown below (Java OpenJdk 11) :

As can be seen from the above figure, the subscript of hash table array in HashMap is determined by the length of bucket array N and the hash() function. In HashMap, when we need to store a pair of key-value pairs, we need to put the corresponding key-value into the hash(Object key) and convert it into the corresponding hash value. After the operation with the length of the bucket array, we get the corresponding bucket array subscript. The corresponding Value is then stored at the specified array index. Similarly, if we go to the corresponding Value in the HashMap based on the key Value, then we need to do the reverse operation of the storage.

Hash collision

Before implementing the save and fetch operations of HashMap, we need to resolve hash collisions of HashMap. We know that a hash algorithm can’t have zero collisions, which means that a hash algorithm has hash collisions. This is based on a very basic theory of combinatorics, the pigeonhole principle (also called the drawer principle). It says that if there are 10 nests and there are 11 pigeons, then there must be one nest with more than one pigeon, in other words, there must be two pigeons in the same nest.

In order to solve hash collision, two main solutions have been proposed, namely developing addressing method and linked list method. However, HashMap mainly uses linked list method to solve hash conflicts. The model for resolving hash conflicts in HashMap is shown below:

The default initial bucket array size for a HashMap is 16, and when a hash collision occurs, the value of the collision is added to the list. In JDK1.8, when the length of the linked list is greater than 8, the linked list will be converted to a red-black tree. The red-black tree can be used to quickly add, delete, and change the HashMap to optimize the efficiency. At the same time, when the length of the linked list is less than 8 again, the red-black tree will be transformed into a linked list, because the red-black tree needs to maintain balance. When the number of linked lists is small, the efficiency of HashMap is not obvious.

Why override equals() when overwriting hashCode ()?

Armed with the basics above, it helps to better understand why we need to override equals() when overriding hashCode ().

To use a HashMap, we need to call the HashMap public method put(K key, V value) to pass key-value pairs into the instance of the HashMap. Key Value, Value Value). The diagram below:

In the code above, we customize the IKey class. The IKey class is very simple, starting at line 17. Create IKey objects mIkey_1 and mIkey_2 on line 4 and 5, respectively, and pass mIkey_1 as the key by calling the put() function on an instance of HashMap. Then print the Value of mIkey_2 by calling get(). The Value of mIkey_2 printed by calling get() should be the Value of the key pair I am Key_1, because the id we pass in the Ikey is 5. The Value we get from mIkey_2 should be the same as the Value we passed in from mIkey_1. However, the console print results are completely unexpected, the console print null, why?

Because we didn’t override hashcode() when we defined HashMap, we directly called the hashcode() function in Objct class when we used mIkey_2 to get the Value passed in by mIkey_1. Class Object hashcode() returns the hash value of the memory address of the IKey Object instance. In this article I’ve argued that the hashCode () method in the Object class returns the memory address of the Object.

Obviously, two instances IKey objects mIkey_1, mIkey_2 in virtual machine memory address is different, so a HashMap to query by mIkey_2 mIkey_1 Value Value, Because there is no key value for mIkey_2’s memory address, the console returns null.

As shown above, we override the hashCode () method in our custom IKey, but we run the main method in the IhashMap. Will the console print “I am key_1”?

The answer is no, the console prints null as well. When we call get() in HashMap, we first calculate the index of mIkey_2’s position in the bucket array using the hash value of mIkey_2. If the array element exists, we then determine whether the element hash is the same as that of mIkey_2. As we mentioned earlier, hashing algorithms can’t be conflict-free. This means that even two different IKey object instances may have the same hash value. Therefore, if both conditions are true, IKey’s equals() method needs to be called to determine whether the ids are the same. If the id values are the same, then the key passed in through mIkey_1 can be found by mIkey_2 as the key value. But we didn’t override equals(), so HashMap calls equals() of the Object class.

We know that the equals() method in the Object class compares the memory addresses of two objects. MIkey_1 and mIkey_2 have different memory addresses, so we have overwritten hashCode () but not equals() to find the Value of mIkey_1. Still null.

Looking at the figure above, when we commented out equals() in the IKey, we get the result we want in the console, I am Key_1.

After the speech

At this point, you might be wondering, why don’t I override hashCode () and equals() when I use Integer, String, etc., as keys in HashMap? This is because Java engineers have already written and implemented both methods for us. Interested readers can browse their source code to see.

The original link

Hashcode () inside HashMap

reference

  • Algorithms (Fourth Edition) by Xie Luyun
  • Yuan Chunfeng, Yu Zihao, Fundamentals of Computer Systems, Second Edition
  • Yang Xiaofeng, Lecture 36 of Java Core Technology
  • Wang Zheng, The Beauty of Data Structures and Algorithms