To understand hashCode thoroughly, I drilled into the JDK source code

Today we’re going to talk about the hashCode() method in Java — from the source perspective. As we all know, Java is an object-oriented programming language. All classes inherit from Object by default.

The Object class contains the hashCode() method:

@HotSpotIntrinsicCandidate
public native int hashCode(a);
Copy the code

This means that all classes have a hashCode() method that returns a value of type int. Since the hashCode() method is a native method (a native keyword modified method implemented in C/C++ and called by Java), it means that there is no specific implementation given in the Object class.

Specific implementation can refer to the JDK/SRC/hotspot/share/runtime/synchronizer. The CPP (source code can be downloaded on to making its warehouse). The get_next_hash() method uses the value of the hashCode to determine which hash generation strategy to adopt.

And hashCode () method is decorated @ HotSpotIntrinsicCandidate annotations, it is a set of effective implementation in the HotSpot virtual machine, based on the CPU instructions.

I exported at the NuggetsMore than 100 articles on Java, the total number of words more than 300,000 words, humorous content, easy to understand, gained a lot of beginner’s recognition and support, including Java syntax, Java collection framework, Java concurrent programming, Java virtual machine and other core content.In order to help more Java beginners, I reorganized and open sourced these articles on GitHub under the titleTeach the younger sister to learn Java”Doesn’t that sound interesting?

GitHub: github.com/itwanger/jm…

Why does an Object class need a hashCode() method?

In Java, the main purpose of the hashCode() method is to work with hash tables.

A Hash Table, also known as a Hash Table, is a data structure that can be accessed directly by key-value. The most important feature of a Hash Table is that it can be quickly searched, inserted, and deleted. And the algorithm that’s used is called a hash, which is to take an input of any length, and turn it into an output of a fixed length, and that output is the hash. Hash algorithms like MD5 and SHA1 are used.

Java examples like HashSet, Hashtable (note the lowercase T), and HashMap are based on concrete implementations of hash tables. One of the most typical examples is HashMap, which is not only frequently asked by interviewers, but also frequently used on the job.

If you think about it, what if you didn’t have a hash table, but you needed a data structure that didn’t allow for duplicate data?

How about using the equals() method to do the comparison one by one? Such a scheme is certainly feasible. However, if the amount of data is very, very large, it is very inefficient to use equals() method to compare one by one, and the best solution is a hash table.

Take HashMap. When we want to add an object to it, we first call the object’s hashCode() method, get the corresponding hash value, and then put the hash value and the object together into the HashMap. When we want to add a new object:

Get the hash value of the object;
Compare it to the existing hash, and if it’s not equal, store it in;
If they are equal, call them againequals()Method to compare objects. If they are equal, they are not saved.
If not, it means the hash is in conflict. Add a linked list to store the new object.
If the list is longer than 8, it is processed as a red-black tree.

In this way, the frequency of calling equals() methods is greatly reduced. In other words, as long as the hash algorithm is efficient enough to minimize the frequency of hash collisions, hash tables are particularly efficient.

Take a look at the HashMap hash algorithm:

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

The object’s hashCode() method is called, then the value is shifted to the right, followed by xOR.

In general, strings are used to hash a HashMap, so let’s take a look at String’s hashCode() method:

public int hashCode(a) {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                : StringUTF16.hashCode(value);
    }
    return h;
}
public static int hashCode(byte[] value) {
    int h = 0;
    int length = value.length >> 1;
    for (int i = 0; i < length; i++) {
        h = 31 * h + getChar(value, i);
    }
    return h;
}
Copy the code

As you can imagine, with such a complex set of calculations and the master-class design of the JDK authors, I believe the probability of hash collisions has been reduced to a minimum.

Of course, in theory, it is possible for two different objects to have the same value computed by the hashCode() method. Therefore, you cannot use the hashCode() method to determine whether two objects are equal. You must use the equals() method.

In other words:

If two objects callequals()Method returns true and callshashCode()Method must be equal;
If two objects callhashCode()Method is not equalequals()Method must result in false;

On the other hand:

If two objects callequals()Method returns falsehashCode()The results obtained by the method are not necessarily unequal;
If two objects callhashCode()Method is calledequals()Method does not give true;

Look at the following code.

public class Test {
    public static void main(String[] args) {
        Student s1 = new Student(18."Zhang");
        Map<Student, Integer> scores = new HashMap<>();
        scores.put(s1, 98);
        System.out.println(scores.get(new Student(18."Zhang"))); }}class Student {
    private int age;
    private String name;

     public Student(int age, String name) {
         this.age = age;
         this.name = name;
     }

     @Override
     public boolean equals(Object o) {
         Student student = (Student) o;
         returnage == student.age && Objects.equals(name, student.name); }}Copy the code

We overwrite the equals() method of the Student class so that if two students have the same age and name, we assume they are the same Student, which is ridiculous, but we are being sloppy.

In the main() method, the 18-year-old Zhang SAN scored a good 98 on the test, so we put the sum of the scores into the HashMap and prepare to output zhang SAN’s score:

null
Copy the code

Unfortunately, the result is null instead of 98 as expected. Why is that?

The reason is that hashCode() is not overridden when overriding equals(). By default, the hashCode() method is a local method that returns the object’s storage address. Obviously s1 in put() and new Student(18, “three “) in get() are two objects whose storage addresses must be different.

The get() method of a HashMap calls hash(key.hashcode ()) to calculate the hash value of the object. The probability of two different hashCode() results being hash() to the same result is extremely small. Get (new Student(18, “three “))) does not get the expected value of 18.

How to solve this problem? Simply override the hashCode() method.

 @Override
 public int hashCode(a) {
     return Objects.hash(age, name);
 }
Copy the code

The Hash () method of the Objects class can generate new hashCode() values for a different number of arguments.

public static int hashCode(Object a[]) {
 if (a == null)
     return 0;

 int result = 1;

 for (Object element : a)
     result = 31 * result + (element == null ? 0 : element.hashCode());

 return result;
}
Copy the code

The code seems simple, and the resulting mathematical formula is as follows (n is the length of the string).

Note: 31 is an odd prime number, which is not large or large. Generally, primes are very suitable for hash calculation. Even numbers are equivalent to shift operation, which is easy to overflow and cause data loss.

That means that if you have the same age and the same name, you’re going to get the same hash. Get (new Student(18, “score three “)) returns the expected value of 98.

There’s a passage in the Bible, The Idea of Java Programming, that describes the hashCode() method.

The most important factor in designing hashCode() is that whenever hashCode() is called on the same object, it should generate the same value. If adding an object to a HashMap using the put() method yields a hashCode() value, and fetching it using the get() method yields another hashCode() value, then the object cannot be retrieved. So, if your hashCode() method relies on mutable data in an object, users should be on the lookout, because when that data changes, hashCode() generates a different hash value, equivalent to a different key.

That is, if a field in an object is prone to change when overwriting hashCode() and equals() methods, then it is best to discard those fields to avoid unexpected results.

Ok. With that in mind, let’s go back to the C++ source code for the native method hashCode().

static inline intptr_t get_next_hash(Thread* current, oop obj) {
  intptr_t value = 0;
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random();
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations. This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
  } else if (hashCode == 2) {
    value = 1;            // for sensitivity testing
  } else if (hashCode == 3) {
    value = ++GVars.hc_sequence;
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
  } else {
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = current->_hashStateX;
    t ^= (t << 11);
    current->_hashStateX = current->_hashStateY;
    current->_hashStateY = current->_hashStateZ;
    current->_hashStateZ = current->_hashStateW;
    unsigned v = current->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    current->_hashStateW = v;
    value = v;
  }

  value &= markWord::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value ! = markWord::no_hash,"invariant");
  return value;
}
Copy the code

If you don’t have a C++ foundation, you can just scratch the surface of the get_next_hash() method without looking at every line of code in detail. The hashCode variable is a global parameter at JVM startup that allows you to switch the hash value generation strategy.

hashCode==0, calling the OS of the operating systemrandom()Method returns a random number.
hashCode == 1In STW (stop-the-world) operations, this policy is usually used in synchronization scenarios. Compute using object addresses, using random numbers that are not updated frequently (GVars.stw_random) Get involved.
hashCode == 2, using return 1, for tests in some cases.
hashCode == 3, which starts at 0, is not thread safe, and multiple threads may get the same hash.
hashCode == 4, is related to the memory location of the object created, output as is.
hashCode == 5, the default value, supports multithreading and uses Marsaglia’s Xor-shift algorithm to generate pseudo-random numbers. The so-called XOR-shift algorithm, in simple terms, looks like a shift register, and each bit moved into the register is xOR generated by a number of bits. The so-called pseudo-random number is not completely random, but true random generation is more difficult, so as long as it can pass a certain random number statistical detection, it can be used as true random number.

As for the deeper excavation, involving the knowledge of mathematics and physics, will not be expanded. Food, after all, is the original sin.

Where that push past

Second brother in CSDN wrote a lot of Java aspects of the series of articles, Java core syntax, Java collection framework, Java IO, Java concurrent programming, Java virtual machine, etc., is also a complete system.

In order to help more Java beginners, the second brother put his serialized “teach younger sister to learn Java” open source to GitHub, although only organized 50, found that the number of words has come to 100,000 +, the content is not to say, easy to understand, funny, illustrated.

GitHub: github.com/itwanger/jm…

If it helps, please give me a like, it will be the strongest motivation for me to continue to share!

To understand hashCode thoroughly, I drilled into the JDK source code

Where that push past

Related Posts

Drill into the JVM class loader

Spring – Mysql WebFlux integration

SpringCloudConsul source code analysis (four) health check