This article is based on OpenJDK 11, HotSpot VIRTUAL machine

Hashcode is a method that you’ll probably come across a lot during development to generate hash codes, so how does it work underneath? What should we pay attention to when using it?

The underlying implementation of the hashCode () method

Hashcode () is an Object method:

@HotSpotIntrinsicCandidate
public native int hashCode();
Copy the code

It is a native method, and is decorated @ HotSpotIntrinsicCandidate annotations, prove that it is a HotSpot in a set of effective implementation, the efficient implementation based on CPU instructions.

Synchronizer.cpp: synchronizer.cpp

static inline intptr_t get_next_hash(Thread* self, oop obj) { intptr_t value = 0; if (hashCode == 0) { value = os::random(); } else if (hashCode == 1) { intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3; value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random; } else if (hashCode == 2) { value = 1; } else if (hashCode == 3) { value = ++GVars.hc_sequence; } else if (hashCode == 4) { value = cast_from_oop<intptr_t>(obj); } else { unsigned t = self->_hashStateX; t ^= (t << 11); self->_hashStateX = self->_hashStateY; self->_hashStateY = self->_hashStateZ; self->_hashStateZ = self->_hashStateW; unsigned v = self->_hashStateW; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); self->_hashStateW = v; value = v; } value &= markWord::hash_mask; if (value == 0) value = 0xBAD; assert(value ! = markWord::no_hash, "invariant"); return value; }Copy the code

The hashcode strategy is used to generate hash values based on the value of the global variable hashcode. Look at globals.hpp to see which one it is:

 experimental(intx, hashCode, 5, "(Unstable) select hashCode generation algorithm")  
Copy the code

Discovery is an experimental JVM variable, in which case you must add an additional parameter to modify it, as shown below:

-XX:+UnlockExperimentalVMOptions -XX:hashCode=2
Copy the code

Also, the hashCode defaults to 5.

Are hashes recomputed each time the hashcode() method is called?

For classes that do not override the hashcode() method, each instance calls the hashcode() method only the first time the hash value is computed, after which the hash value is stored in the MarkWord in the object header.

(The image above is from:www.cnblogs.com/helloworldc…

If you enter various lock states, it will be cached elsewhere, usually in the thread that acquired the lock, and restoring no lock (i.e. releasing the lock) will be changed back to the original hash value.

Object header structure, as well as object storage structure, you can refer to: Java GC details – 1. Understand Java object structure

-xx :hashCode=0 Generates hash values using the Park-Miller pseudorandom number generator

if (hashCode == 0) {
    value = os::random();
}
Copy the code

Call OS random method to generate random number. This method is implemented as os.cpp:

// Initial seed, the default is 1 volatile unsigned int OS ::_rand_seed = 1; static int random_helper(unsigned int rand_seed) { /* standard, well-known linear congruential random generator with * next_rand = (16807*seed) mod (2**31-1) * see * (1) "Random Number  Generators: Good Ones Are Hard to Find", * S.K. Park and K.W. Miller, Communications of the ACM 31:10 (Oct 1988), * (2) "Two Fast Implementations of the 'Minimal Standard' Random * Number Generator", David G. Carta, Comm. ACM 33, 1 (Jan 1990), pp. 87-88. */ const unsigned int a = 16807; const unsigned int m = 2147483647; const int q = m / a; assert(q == 127773, "weird math"); const int r = m % a; assert(r == 2836, "weird math"); // compute az=2^31p+q unsigned int lo = a * (rand_seed & 0xFFFF); unsigned int hi = a * (rand_seed >> 16); lo += (hi & 0x7FFF) << 16; // if q overflowed, ignore the overflow and increment q if (lo > m) { lo &= m; ++lo; } lo += hi >> 15; // if (p+q) overflowed, ignore the overflow and increment (p+q) if (lo > m) { lo &= m; ++lo; } return lo; } int os::random() { // Make updating the random seed thread safe. while (true) { unsigned int seed = _rand_seed; unsigned int rand = random_helper(seed); //CAS update if (Atomic:: CMPXCHG (&_rand_seed, seed, rand) == seed) {return static_cast<int>(rand); }}}Copy the code

Random_helper is the implementation of the random number generation formula, which is:Here, a is 16807, c is 0, and m is 2 to the 31-1

Since these random numbers are all from the same generator, CAS updates the same seed, and there may be performance issues if there are a large number of generated new objects that all call the hashcode() method. Calling the hashcode() method of the same object repeatedly won’t be a problem because, as mentioned earlier, it is cached.

-xx :hashCode=1 or 4 based on the object pointer OOPs

The OOPs (Ordinary Object Pointers) Object pointer is part of the Object header. Object header structure, as well as object storage structure, you can refer to: Java GC details – 1. Understand Java object structure. It can be simply understood as a description of the location of an object in memory.

else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
}
else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj);
}
Copy the code

Cast_from_oop is a simple way to get the address to which oopDesc, the oop implementation base class, points. Understand Java object structure) :

template <class T> inline T cast_from_oop(oop o) {
  return (T)(CHECK_UNHANDLED_OOPS_ONLY((oopDesc*))o);
}
Copy the code

When -xx :hashCode=4, use the OOP address as the hash value. -xx :hashCode=1; Stop The World (STW) stw_random; The addr_bits ^ (addr_bits >> 5) ^ gvars.stw_random transform reduces hash collisions and makes hash values more hashed.

For more information on Stop the World, see SafePoint and Stop the World.

-xx :hashCode=2 sensitive test, constant 1

else if (hashCode == 2) {
    value = 1;            // for sensitivity testing
}
Copy the code

It is used to test whether certain collections are sensitive to hash values.

-xx :hashCode=3 self-increasing sequence

else if (hashCode == 3) {
    value = ++GVars.hc_sequence;
}

struct SharedGlobals {
  // omitted
  DEFINE_PAD_MINUS_SIZE(1, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int) * 2);
  // Hot RW variable -- Sequester to avoid false-sharing
  volatile int hc_sequence;
  DEFINE_PAD_MINUS_SIZE(2, DEFAULT_CACHE_LINE_SIZE, sizeof(volatile int));
};
static SharedGlobals GVars;
Copy the code

Every time you create a new object, you call the hash value, the increment +1, and you can see that the hash is very bad, and it’s very easy to hash collisions.

-xx :hashCode=5 Default implementation

else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation --  we'll // likely make this the default in future releases. unsigned t = self->_hashStateX; t ^= (t << 11); self->_hashStateX = self->_hashStateY; self->_hashStateY = self->_hashStateZ; self->_hashStateZ = self->_hashStateW; unsigned v = self->_hashStateW; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); self->_hashStateW = v; value = v; }Copy the code

The algorithm is Marsaglia’s XOR-Shift random number generation method. This paper mainly proposes a fast hash algorithm with good hash performance.

Special hashes cause problems in some scenarios

We often use the hash value of an object or a field to obtain the subscript by modulo the length of an array, and then take out the object corresponding to the subscript of the array for further processing. This is common in load balancing, task scheduling, and thread allocation. Is there a problem with the following code?

Int index = math.abs (userid.hashCode ()); Return useravatarList.get (index % useravatarlist.size ()).geturl (); return useravatarlist.get (index % useravatarlist.size ()).geturl ();Copy the code

In most cases, this is fine, but suppose the userId is one of these strings with a hash value of integer.min_value:

System.out.println("polygenelubricants".hashCode());
System.out.println("GydZG_".hashCode());
System.out.println("DESIGNING WORKHOUSES".hashCode());
Copy the code

Output:

- 2147483648-2147483648-2147483648Copy the code

For these values, if you take the absolute value of math.abs (), we know that math.abs (integer.min_value) is still equal to integer.min_value, because the underlying implementation:

public static int abs(int a) {
    return (a < 0) ? -a : a;
}
Copy the code

– integer. MIN_VALUE and integer. MIN_VALUE are equal. Integer.MIN_VALUE specifies whether the modulus of the Integer is negative, so that an exception will be raised when the object corresponding to the subscript is fetched.

Therefore, it needs to be changed to:

int index = Math.abs(userId.hashCode() % userAvatarList.size());
return userAvatarList.get(index).getUrl();
Copy the code