1. The hashCode is introduced

HashCode () is used to get a hashCode, also known as a hashCode; It actually returns an int. This hash code is used to determine the index position of the object in the hash table. If you looked at my previous article on hash tables, this hash code is equivalent to querying the address of the name keyword k in the hash table based on the first letter. HashCode () is defined in the JDK’s Object.java, which means that any class in Java contains a hashCode() function.

2. The role of hashCode

Arrays are the most efficient data structure in Java, but there’s a caveat to “highest.” First, we need to know the location of the data being queried. Second: if we conduct iterative search, the amount of data must be small. For large amount of data, the set is generally recommended.

There are two classes of Java collections, List and Set. The difference between them is that the elements in List are ordered and repeatable, while the elements in Set are unordered and non-repeatable. Lists are easy to handle, but for sets, how do we make sure that elements don’t repeat? Iterate to see if equals() is equal. Small amount of data is acceptable, when we have large amount of data efficiency can be imagined (of course we can use algorithms to optimize). For example, if we insert 1000 data into HashSet, do we really want to iterate 1000 times and call equals() 1000 times? HashCode provides a solution. How do you do that? Let’s start with the source code for hashCode (Object).

    public native int hashCode(a);
Copy the code

It is a local method whose implementation is specific to the local machine. When we add an element to a collection, the collection first calls the hashCode method so that we can directly locate where it is stored, or save it if there are no other elements. If there are already elements, equals is called to match whether the two elements are the same. If they are the same, they are not; if they are different, they are hashed elsewhere. This way, we can greatly reduce the number of calls to equals() when we store a large number of elements, greatly increasing efficiency.

So the role of hashCode is to find the region of an object in a collection. HashCode collection can be divided into several regions, each object can compute hash code, they can hash code groups, each group corresponds to a storage area (hash), according to an object’s hash code can determine the object storage area, thus greatly reducing the number of matching elements, improve the query efficiency.

3. The importance of hashCode to an object

Is hashCode important? Not important, for List collections, for arrays, but for hashMaps, hashsets, and hashtables, it becomes extremely important. So be careful with hashCode when using HashMap, HashSet, HashTable. For an object, the hashCode procedure is a simple implementation of the Hash algorithm, and the implementation of the Hash algorithm is very important for you to implement the object access procedure.

Take HashTable as an example to illustrate the importance of hashCode for an object.

An object is bound to have several attributes. How to select attributes for hash tests one’s design ability. If we hash all of the attributes, this is bound to be a bad design, because the object’s hashCode method is called all the time, and if too many attributes are hashed, the operand time required will increase significantly, which can seriously affect the performance of the program. However, if fewer entries participate in the hash, the diversity of the hash will be weakened, resulting in a large number of hash “conflicts”, which will not only fail to make good use of space, but also affect the query efficiency of the object to some extent. In fact, the two are contradictory, the diversity of hashes will lead to performance degradation.

So how to design the object hashCode, I have no experience. A solution found online is to set a cache id to cache the current hashCode, recalculate it only when the object participating in the hash changes, otherwise call the cached hashCode, and you can greatly improve performance.

Calculate the index position of an object in the table[] array using HashTable:

    int index = (hash & 0x7FFFFFFF) % tab.length;
Copy the code

Why 0x7FFFFFFF? Because some objects may have a negative hashCode, the and operation with 0x7FFFFFFF ensures that index is a positive number. I can use hashCode to locate an object directly, so in theory we can use hashCode to locate an object directly in the hash table. But why would there be a key-value pair? How about using hashCode for keys to store data instead of value directly? This relates to the most important aspect of the HashTable performance problem: Hash collisions! (See article)

We know that conflicts occur when different objects produce the same hashcode. Hashcode returns an int, and its value can only be in the int range. What if we store more data than int? This must produce two identical hash codes, which will store two objects in the hash code location, and we can use the key itself to determine. So objects with a relative index, there are multiple objects at the hashCode location, we must rely on the key’s hashCode and the key itself to distinguish. And the Key comparison uses equals

4. The equals and hashCode

In a nutshell:

  • If two objects are equal, the Hashcode must also be the same
  • If two objects are equal, calling equals on both objects returns true
  • Two objects have the same hashCode value, and they are not necessarily equal
  • Therefore, if equals is overridden, hashCode must be overridden as well
  • The default behavior of hashCode() is to generate unique values for objects on the heap. If hashCode() is not overridden, the two objects of the class will never be equal anyway (even if they point to the same data)

The object comparison process is as follows: