In recent interviews, after asking about the data structure of HashMap, it’s common to ask another question: Why do you overwrite hashCode when overwriting equals?

This question essentially goes back to the HashMap scenario to see if the interviewer really understands it. In this article, we will take a look at the relationship between equals and HashCode methods.

Equals and Hashcode exist

Each class has an equals method and a HashCode method. Because all classes inherit from the Object class. The Object class is defined as follows:

public boolean equals(Object obj) {
    return (this == obj);
}
    
public native int hashCode();
Copy the code

Intuitively, equals compares references to objects by default, using “==”. The hashCode method is a native method that returns an integer.

Both methods are not final and can be overridden.

The equals() and hashCode() methods have been overridden for common classes like String, Math, Integer, and Double.

The equals () method

The equals() method is used to determine whether two objects are equal. Object implements equals by default, but it obviously doesn’t fit the needs of personalization, so it often needs to be overwritten. For the common String class, override equals as follows:

Public Boolean equals(Object anObject) {if (this == anObject) {return true; } if (anObject instanceof String) { String anotherString = (String)anObject; int n = value.length; if (n == anotherString.value.length) { char v1[] = value; char v2[] = anotherString.value; int i = 0; while (n-- ! = 0) { if (v1[i] ! = v2[i]) return false; i++; } return true; } } return false; }Copy the code

The comparison here is no longer just an address comparison. First compare by address, if the address is the same then it must be the same object. Compare the contents of the char array if the addresses are different, and return true if they are exactly equal.

Attributes of equals() method

There are comments on the equals() method of the Object class that explain some of the properties that equals() needs to satisfy:

  • Reflexive. For any non-null reference value x, x.equals(x) must be true;
  • Symmetric. For any non-null reference values x and y, y.quals (x) is true if and only if x.quals (y) is true.
  • Transitive. For any non-null reference values x, y, and z, if x. quals(y) is true and y. quals(z) is true, then x. quals(z) must be true.
  • Consistent. For any non-null reference values x and y, x.equals(y) returns either true or false on multiple calls to equals if the object information used for the comparison has not been modified;
  • For any non-null reference value x, x.equals(null) returns false;

By comparison with the above characteristics, we find that the Object method directly compares two reference addresses, and only two addresses are equal, that is, the equivalence relation with the greatest possibility of difference.

The Equals method of strings, on the other hand, includes not only cases where the application addresses are the same, but also cases where the String values stored within them are the same. That is, two strings with the same String value return true. This is exactly what we call the “equals method comparing values” in most cases.

Since there is a default exception to equals of objects, we can’t say that equals compares concrete values and “==” compares references without a custom equals method.

HashCode () method

The hashCode() method returns a hash code value of the object. This method is used with hash tables, such as HashSet, HashMap.

HashCode () is a native method that returns values of an integer type and can be overridden.

The native hashCode() method in Object returns the Object’s address in memory as a hashCode, ensuring that different objects return different values.

Also take the String class, whose hashCode method is:

Public int hashCode() {int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; }Copy the code

The basic formula for calculating the hash value is as follows: s[0]*31^(n-1) + S [1]*31^(n-2) +… + s/n – 1).

Where s[I] is the ith character of the string, n is the length of the string, and ^ denotes exponentiation (the hash code for the empty string is 0).

The number 31 was used in the calculation for the following reasons:

1. Due to the property of prime numbers, when multiplied by other numbers, the probability of a unique result is higher and the probability of hash conflicts is lower.

2. The larger the number of primes used, the smaller the probability of hash conflicts, but the slower the calculation speed; 31 is a compromise between hash conflict and performance, and is actually the result of experimental observations.

The JVM automatically optimizes 31:31 * I == (I << 5) -i;

What the hashCode() method does

The hashCode() method mentioned earlier is mainly used in hash tables, such as HashSet, HashMap, and so on.

Let’s take a look at an ArrayList, which is an array at the bottom, and each piece of data is stored in the underlying array, without having to determine whether the data is duplicated.

The elements in Set are unordered and non-repeatable, so how do you ensure that the stored elements are not duplicated? Call equals() one by one to compare? It works fine with a small amount of data, but with a large amount of data the time complexity is basically O(n), and you have performance problems.

In Java, hash algorithm is used to solve this problem. The object (or data) is directly mapped to an address according to a specific algorithm. In this way, the time complexity tends to O(1), and the object access efficiency is greatly improved.

When an element is added to Set, call hashCode() first to locate the actual location of the element. If there are no elements in this location, it means that it is the first time to store the element. If there is an object at this location, call equals() to compare, discard the element if it is equal, and hash to another address if it is not.

The above example also shows why hashCode() must be equal if equals() is equal, and then when you override equals, you override hashCode() as well.

The basic processing mechanism of a HashMap is similar to that of a HashSet, except that the underlying data store structure is different.

In short, HashCode greatly reduces the number of object comparisons and improves the efficiency of collection lookups.

Properties of the hashCode() method

The hashCode implementation also requires that the equals method of Object be annotated:

  • During the execution of a Java application, an object calls the hashCode() method multiple times if the information it provides to equals to compare has not been modified. The method must always return the same INTEGER.
  • If two objects are equal according to equals(Object), then calling their respective hashCode() methods must produce the same INTEGER result.
  • It is not required that two objects are not equal according to the equals(java.lang.object) method, and that calling their respective hashCode() methods must produce different INTEGER results. However, producing different INTEGER results for different objects may improve hash table performance.

How to rewrite hashCode()

A simple generic hashCode algorithm is provided in Effective Java.

Int result = 17; int result = 17; int result = 17;

B. Select all equals fields (only equals() fields for comparison), and then calculate the attributes of each field:

  • (1) If Boolean, calculate f? 1-0
  • (2) if byte\char\short\int, calculate (int)f
  • (int)(f ^ (f >>> 32))
  • Float. FloatToIntBits (f)
  • Double. DoubleToLongBits (f); return long; use rule (3) to process long and return int
  • (6) In the case of object applications, if the equals method is used to call the comparison recursively, then hashCode will also be used to call hashCode recursively. Otherwise you need to compute a normal form for the field, such as zero for hashCode when the field is null
  • (7) If it is an array, then each element needs to be treated as a separate field. Java. Util. Arrays. The hashCode method contains 8 kinds of basic types of Arrays and references of hashCode calculation, the above algorithm.

C. Finally, merge the hash code of each field into the hash code of the object.

summary

What is clear about the equals method is that it is used to compare whether two objects are equal. For the hashCode method, the focus is on improving efficiency in a hashMap-like scenario, which is just a technical requirement.

The equals method is commonly used in collections to compare objects for equality, and the hashCode method is used to solve performance problems that occur when large data volumes occur.

In practice, we rarely use an Object as a Map key, because if the attributes of the Object change, the hashCode will change and the corresponding value may not be found, and String is an immutable Object, so it works well as a key.