How are the values of hashCode generated? Object memory address?

This is the fifth day of my participation in Gwen Challenge

Make it a habit to like it first

Let’s start with the simplest print

System.out.println(new Object());
Copy the code

Outputs the fully qualified class name of the class and a string:

java.lang.Object@6659c656
Copy the code

What comes after the @ sign? Is it hashcode or the memory address of the object? Or something else?

The hashcode behind @ is the hexadecimal hashcode of the object.

Object o = new Object();
int hashcode = o.hashCode();
// toString
System.out.println(o);
// hashcode hexadecimal
System.out.println(Integer.toHexString(hashcode));
// hashcode
System.out.println(hashcode);
// This method also gets the object's hashcode; Unlike Object.hashCode, however, this method ignores the overwritten Hashcode
System.out.println(System.identityHashCode(o));
Copy the code

Output result:

java.lang.Object@6659c656
6659c656
1717159510
1717159510
Copy the code

So how does the object’s Hashcode actually get generated? Is it really the memory address?

This article is based on JAVA 8 HotSpot

HashCode generation logic

The logic for generating hashCode in the JVM is not that simple, and it provides several strategies, each of which produces a different result.

Take a look at the core method of generating hashCode in the OpenJDK source code:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations. This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11); Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ;unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask;if (value == 0) value = 0xBAD ;
  assert(value ! = markOopDesc::no_hash,"invariant");TEVENT (hashCode: GENERATE) ;
  return value;
}
Copy the code

As you can see from the source code, the build policy is controlled by a hashCode global variable that defaults to 5; This variable is defined in a separate header file:

  product(intx, hashCode, 5."(Unstable) select hashCode generation algorithm" ) 
Copy the code

The source code is very clear… (unstable) Select the algorithm generated by hashCode, and the definition here is controlled by the JVM startup parameter, first check the default value:

java -XX:+PrintFlagsFinal -version | grep hashCode

intx hashCode                                  = 5                                   {product}
openjdk version "1.8.0 comes with _282"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8. 0 _282-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
Copy the code

So we can configure different hashCode generation algorithms by JVM startup parameters, and test the results under different algorithms:

-XX:hashCode=N
Copy the code

Now let’s look at the different representations of each hashCode generation algorithm.

The 0th algorithm

if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random(a); }Copy the code

This generation algorithm uses a park-Miller RNG random number generation strategy. But it’s important to note that… This random algorithm will have spin wait at high concurrency

The first algorithm

if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations. This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addrBits = intptr_t(obj) >> 3 ;
    value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
}
Copy the code

This algorithm, which is really the memory address of the object, gets the object’s intPTR_T pointer directly

The second algorithm

if (hashCode == 2) {
    value = 1 ;            // for sensitivity testing
}
Copy the code

I don’t need to explain this… Fixed return 1, should be used for internal test scenarios.

If you’re interested, try -xx :hashCode=2 to turn on the algorithm and see if the hashCode results are all 1’s.

The third algorithm

if (hashCode == 3) {
    value = ++GVars.hcSequence ;
}
Copy the code

This algorithm is also very simple, increment, all object hashCode uses this one increment variable. Let’s try it out:

System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());

//output
java.lang.Object@144
java.lang.Object@145
java.lang.Object@146
java.lang.Object@147
java.lang.Object@148
java.lang.Object@149
Copy the code

Sure enough, it’s self-increasing… A little meaning

The fourth algorithm

if (hashCode == 4) {
    value = intptr_t(obj) ;
}
Copy the code

This is not much different from the first algorithm, which returns the address of the object, but the first algorithm is a variation.

The fifth algorithm

The final, and default, generation algorithm is used when hashCode configuration is not equal to 0/1/2/3/4:

else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11); Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ;unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); Self->_hashStateW = v ; value = v ; }Copy the code

Here is a hash value from the current value of the XOR (XOR) operation, which is more efficient than the previous increment and random algorithm, but the repetition rate should also be relatively high, but the hashCode repetition rate is also relatively high.

The JVM does not guarantee that the value will not be repeated, as in HashMap, which resolves hash collisions

conclusion

A hashCode can be a memory address or not, and it can even be a constant or increment of 1! It can use any algorithm you want!

Original is not easy, prohibit unauthorized reprint. Like/like/follow my post if it helps you ❤❤❤❤❤❤

How are the values of hashCode generated? Object memory address?

HashCode generation logic

The 0th algorithm

The first algorithm

The second algorithm

The third algorithm

The fourth algorithm

The fifth algorithm

conclusion

Related Posts

Variables and built-in functions that are essential for learning Python

Payment Spring Boot 1.0.4.RELEASE, the most easy-to-use implementation of wechat Payment V3

Linked list basic operation | More challenging in August