Basics: Introduction to hashCode() and equals()

Before we look at the relationship between hashCode() and equals(), it’s worth taking a look at each of them individually.

  • The equals() method, which compares whether two objects are equal, is fundamentally different from the “==” equality comparator. In the all-object Java architecture, the system gives the programmer the power to determine whether objects are equal by writing the equals() method to the Object class and having all classes inherit the Object class. This allows programmers to customize equals() methods in their classes to implement their own business logic.
  • You can tell the difference between equals() and “==” — see this article

 

  • A hashCode() is a hash value. A hash value is the result of a hash function that guarantees that the same input will produce the same output (hash value), but does not guarantee that different inputs will always produce different output. Hash collisions occur when the input sample size is large enough, that is, different inputs produce the same output.
  • Conflicts aside, the fact that the same input can produce the same output is invaluable. It makes the system only need through simple operation, in the case of time complexity O(1) can get the mapping relationship of data, according to this characteristic extended hash table data structure.
  • A mainstream hash table implementation is: the array as the hash function output field, the input value through the hash function calculation to get the hash value, and then according to the hash value in the array to find the corresponding storage unit. When a conflict occurs, the storage unit stores the conflicting data in a linked list.

 

HashCode () and equals()

Let’s introduce the relationship between hashCode() and equals() from a macro perspective

  • In most programming practices, this boils down to data access. In the assembly language era, you had to honestly write access statements for every data operation; Today, we all code in high-level languages like Java. In addition to having the core idea of object-oriented, Java also encapsulates a series of APIS for manipulating data, which provides great convenience for programming.
  • But before we operate on the data, we must first store the data in the storage unit according to a certain data structure, otherwise it will be impossible to operate the data. However, different data structures have their own characteristics, so we need to choose the data structure suitable for our own storage. Java provides a variety of container classes based on different data structures, making it easy for programmers to choose the appropriate container classes for business development.
  • Java container classes are divided into Collection and Map, and Collection can be further divided into List and Set. Map and Set do not allow duplicate elements. Strictly speaking, Map stores key-value pairs and does not allow duplicate key values. It is worth noting that most Map and Set implementation classes use hash tables at the bottom.
  • At this point we extract two keywords that do not allow repetition and hash structure, review the characteristics of hashCode() and equals(), and do you have anything in mind?

 

Decryption: Dig deeper into the relationship between hashCode() and equals().

  • As mentioned above, sets and maps do not hold duplicate keys, so when storing an element, you must make a decision about the element: Are there any elements in the current container that are identical to the new element? .
  • You might be thinking: That’s easy, why don’t you just call equals() on the element objects to compare them? This is a good idea if the number of objects in the container is small, but if the number of objects in the container reaches a certain size, it’s not easy to call equals() of all the objects in the container and compare them with the new elements, even if the comparison logic is simple. This is also an O(n) operation.

 

  • But it’s much easier to determine if the new object is the same as any other object in the container based on a hash table. Since each object comes with hashCode(), this hashCode will be used as input to the hash function that evaluates the hashCode to the hash value, and the new object will be stored in the corresponding memory location based on the hash value.
  • We might assume that two of the same object, and hashCode () must be the same, so it reflects the power of hash function, since the same input will produce the same output, so if a new object and the existing object in the same container, a new object and calculate the hash value will be existing object’s hash value conflict, At this point, the container can determine that the new element already exists and needs to be handled differently (overwriting the original element (key) or discarding it).
  • The idea is that if the hash value of the element has no conflicting addresses, that is, no duplicate elements, it can be inserted directly. Therefore, when using hashCode(), the cost of determining whether there are the same elements is only one hash calculation with time complexity of O(1), which greatly improves the storage performance of data.

 

  • However, we also mentioned earlier that when the input sample size is large enough, different inputs will produce the same output, that is, the formation of hash conflict. The original rule that “if there is a conflict, it means two objects are the same” is broken instantly, and the conflict is likely to be between two different objects!
  • Thankfully, in addition to the hashCode() method, we have an ace in the hole: equals(). This means that when two different objects have hash conflicts, we can use equals() to further determine whether the two objects are the same. This is where the equals() method is important, in which case it must be able to determine that the two objects are not the same.
  • This brings us to some important principles of Java programming:
  • If two objects are equal, their equals() method should return true, and their hashCode() should return the same result.
  • But sometimes interview questions aren’t so straightforward. They’ll ask you: If two objects have the same hashCdoe(), their equals() method must return true, right?
  • That can’t be the right answer. Since we cannot guarantee that every programmer will follow the coding rules, it is possible that hashCode() for two different objects will return the same result. This is easy to answer if you understand the above: Two objects that have the same hashCode() will cause hash collisions in the hash table in the future, but they don’t have to be the same object. When hash conflicts occur, we also need to further determine whether two objects are the same via equals(), which does not necessarily return true.
  • This is why Java officially recommends that we override both hashCode() and equals() methods.

 

Verification: Verify the relationship between HashMap source code and official documents.

The above text is a reflection of my own thinking. It is valid but not entirely reliable. Let’s verify this with the source code for HashMap (JDK1.8) and the official documentation.

  • Reading the official JDK8 documentation, we find the end of the equals() method description:

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.

  • The official documentation reminds us that when overriding equals, it’s best to also override hashCode(). That is, if we override equals to determine that two objects are the same, they should also have the same hash code for hashCode() to do its job.
  • So how does it work? We combine some of the more commonly used HashMap source code for further analysis (like HashSet is also implemented through HashMap).
  • The most commonly used method in hashmaps is put().
public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code
  • We can see that the put() method actually calls putVal(), so follow up:
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // When we create the HashMap object, no table space is allocated in memory for the HashMap, and the resize() method is called to initialize the table until we add elements to the HashMap put
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;// Specify the length of the table
        
    //((n-1) &hash) determines the position of the element to put, and if the place to insert is empty, it can be inserted directly.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {// If a conflict occurs, insert the element at the end of the list at the conflict position
        Node<K,V> e; K k;
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            / / the key!!!!!! When determining whether the newly added element is the same as the existing one, we first check the hash value and then call equals(). If the hash value is different, skip it
            e = p;
        else if (p instanceof TreeNode)// If the conflict resolution has become a red-black tree, add nodes according to the red-black tree strategy.
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {// The way to resolve conflicts is still linked lists
            for (int binCount = 0; ; ++binCount) {// Find the end of the list and insert.
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);// Determine the length of the list after insertion. If a certain value is reached, it may be converted to a red-black tree.
                    break;
                }// Whether the current key is the same as the passed key is constantly checked during the traversal process. The first condition for judging is still the hash value.
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;// The number of map modifications increases
    if (++size > threshold)// If the capacity of the hashMap reaches a certain value, it needs to be expanded
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code
  • We can see that whenever we check whether the key is the same or not, we first check the hash value, and if the hash value is the same (creating a conflict), then check whether the object referenced by the key is the same, and finally check through equals().
  • If the hash value of the key is different, the following judgment will not be performed, and the two objects are directly determined to be different.
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
    e = p;
Copy the code

The end of the

  • HashCode () and Equals () methods.
  • I had a question earlier, and one you might have after reading this article: I use equals() on a regular basis, so I know it has other uses besides being closely related to hashCode(). But what about hashCode(), does it have a purpose other than its close association with equals()?
  • After searching the Internet, my answer so far is no. That is, hashCode() is useful only in hash tables, not in other cases.
  • Of course, if this answer is not correct, or you have other thoughts, welcome to leave a message to communicate with me ~