Before getting into hashCode, I’ll pose a few questions that I’ll answer later
- What is a hashCode?
- What does hashCode do?
- Why do hash conflicts occur, and can they be avoided?
- Why does overriding equals() require overriding hashCode()?
- How is hashCode generated in Java?
HashCode () : java8 Object: hashCode() : java8 Object: hashCode(
/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {@link java.util.HashMap}.
* <p>
* The general contract of {@code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the {@code hashCode} method
* must consistently return the same integer, provided no information
* used in {@code equals} comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the {@code equals(Object)}
* method, then calling the {@code hashCode} method on each of
* the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {@link java.lang.Object#equals(java.lang.Object)}
* method, then calling the {@code hashCode} method on each of the
* two objects must produce distinct integer results. However, the
* programmer should be aware that producing distinct integer results
* for unequal objects may improve the performance of hash tables.
* </ul>
*/
Copy the code
I can roughly sum up his meaning in the following points
- Each object returns a hashCode, which is used to speed up the hash table
- The return of hashCode() should and should not change if the equals() comparison has not changed when the same Java application is called more than once on the same object during execution. (In particular, different Java programs can allow hashCode to change within the same Java program.)
- According to equals(), if two objects are equal, hashCode must also be equal
- If two objects are not equal according to equals(), hashCode may be equal, but make them as different as possible to improve hash table performance
What is a hashCode?
HashCode is used to speed up hash tables. In the same Java application, hashCode is a signature of a Java object, and equal objects must be equal hashCode.
What does hashCode do?
It’s used to speed up the hash table, as I mentioned, in space for time.
This blog post is not about hashing tables, so I won’t expand it here
The idea is to use hashcode to find the corresponding hash bucket in the hash table, and then call equals() to determine whether the elements mounted in the hash bucket are equal.
Why do equal elements appear in the same hash bucket?
As mentioned earlier, hashcode objects that are equal must be equal, but hashcode objects that are equal are not necessarily equal, so you need to call equals() again
Why do hash conflicts occur, and can they be avoided?
Let’s take a look at the method signature of hashCode in Java
public native int hashCode(a);
Copy the code
It can be seen that hashCode is essentially an int. The range of data represented by int is limited, while the object is infinite. When an infinite set is compressed into a finite set, hash conflict becomes an inevitable event, and hash conflict is inevitable.
Of course, in our application, objects cannot be infinite, so choosing a good hash algorithm can reduce the probability of hash conflicts
Why does overriding equals() require overriding hashCode()?
This issue has actually been mentioned several times before. According to the Java specification, hashCode() must be equal to equals()
Overwriting equals() invariably results in a change in whether objects are equal, so the hashCode() method needs to be modified
If you change equals() without changing hashCode(), the hash bucket cannot be accurately located when using the hash table, causing the hash table to work improperly
How is hashCode generated in Java?
This is a very complicated problem, so let me extract some comments from java8 Object
/**
* As much as is reasonably practical, the hashCode method defined by
* class {@code Object} does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java™ programming language.)
*/
Copy the code
Most of the time hashCode is a map of an object’s memory address, but in Java it is not.
So how is hashCode generated in Java?
In fact, Java’s default HashCode is generated by a set of random algorithms that depend only on the order and thread in which objects are generated.
So let me prove it in code
public static void main(String[] args) {
for (int i = 0; i < 5; i++) {
Object o = new Object();
System.out.println("hashCode:" + o.hashCode());
System.out.println("address:"+ VM.current().addressOf(o)); }}Copy the code
The Unsafe class was straightforward to retrieve the memory address, but it was a bit cumbersome, so I used a packaged toolkit
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.10</version>
</dependency>
Copy the code
Repeated execution of the above code results in the following
For the first time, | The second time | The third time |
---|---|---|
hashCode:531885035 address:31856529176 hashCode:705265961 address:31874395832 hashCode:428746855 address:31874396520 hashCode:317983781 address:31874397208 hashCode:987405879 address:31874397896 | hashCode:531885035 address:31856529344 hashCode:705265961 address:31874395856 hashCode:428746855 address:31874396544 hashCode:317983781 address:31874397232 hashCode:987405879 address:31874397920 | hashCode:531885035 address:31856529920 hashCode:705265961 address:31874407200 hashCode:428746855 address:31874407888 hashCode:317983781 address:31874408576 hashCode:987405879 address:31874409264 |
We can clearly see that the same order of hashCode is the same, and the memory address is not the same, so we can be sure that Java hashCode and memory are independent, and seem to be related to the creation order.
Next we use code to see what happens when a hash conflict occurs
public static void main(String[] args) {
HashSet<Integer> set = new HashSet<Integer>();
int count = 0, num = 0;
while (true) {
count++;
Object o = new Object();
if (set.contains(o.hashCode())) {
num++;
System.out.println("count:" + count);
System.out.println("hashCode:" + o.hashCode());
if (num == 5) break; } set.add(o.hashCode()); }}Copy the code
Repeat the above code
For the first time, | The second time | The third time |
---|---|---|
count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992 | count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992 | count:105730 hashCode:2134400190 count:111177 hashCode:651156501 count:121838 hashCode:1867750575 count:145361 hashCode:2038112324 count:146294 hashCode:1164664992 |
We can see that the location of the hash collision is the same, and the hashCode is the same when the hash collision occurs, so we can conclude that the generation of hashCode is related to the order in which the object is generated.
Next we repeat the above operation with different threads
public static void main(String[] args) throws InterruptedException {
new Thread(() ->{
HashSet<Integer> set = new HashSet<Integer>();
int count = 0, num = 0;
while (true) {
count++;
Object o = new Object();
if (set.contains(o.hashCode())) {
num++;
System.out.println("count:" + count);
System.out.println("hashCode:" + o.hashCode());
if (num == 5) break;
}
set.add(o.hashCode());
}
}).start();
Thread.sleep(1000L);
}
Copy the code
The results are as follows
For the first time, | The second time | The third time |
---|---|---|
count:53869 hashCode:820543451 count:87947 hashCode:951498229 count:99859 hashCode:175788428 count:100940 hashCode:1979813773 count:123438 hashCode:916565907 | count:51579 hashCode:1573800482 count:124111 hashCode:1834341042 count:139761 hashCode:2021276643 count:142276 hashCode:1807321828 count:143515 hashCode:1522017140 | count:69186 hashCode:742597489 count:102967 hashCode:2084692455 count:144290 hashCode:424089733 count:176112 hashCode:53258565 count:178639 hashCode:1046986547 |
There seems to be no pattern, which requires to dig Java source code to solve this phenomenon
Object source code address
In Object we can see that hashCode actually calls IHashCode
static JNINativeMethod methods[] = {
{"hashCode"."()I", (void *)&JVM_IHashCode},
{"wait"."(J)V", (void *)&JVM_MonitorWait},
{"notify"."()V", (void *)&JVM_MonitorNotify},
{"notifyAll"."()V", (void *)&JVM_MonitorNotifyAll},
{"clone"."()Ljava/lang/Object;", (void *)&JVM_Clone},
};
Copy the code
It then finds IHashCode in jvm.cpp and actually calls FastHashCode
JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
JVMWrapper("JVM_IHashCode");
// as implemented in the classic virtual machine; return 0 if object is NULL
return handle == NULL ? 0 : ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)) ;
JVM_END
Copy the code
Then steak FastHashCode FastHashCode in synchronizer. CPP
intptr_t ObjectSynchronizer::FastHashCode (Thread * Self, oop obj) {
if (UseBiasedLocking) {
// NOTE: many places throughout the JVM do not expect a safepoint
// to be taken here, in particular most operations on perm gen
// objects. However, we only ever bias Java instances and all of
// the call sites of identity_hash that might revoke biases have
// been checked to make sure they can handle a safepoint. The
// added check of the bias pattern is to avoid useless calls to
// thread-local storage.
if (obj->mark()->has_bias_pattern()) {
// Box and unbox the raw reference just in case we cause a STW safepoint.
Handle hobj (Self, obj) ;
// Relaxing assertion for bug 6320749.assert (Universe::verify_in_progress() || ! SafepointSynchronize::is_at_safepoint(),"biases should not be seen by VM thread here");
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current()); obj = hobj() ; assert(! obj->mark()->has_bias_pattern(),"biases should be revoked by now"); }}// hashCode() is a heap mutator ...
// Relaxing assertion for bug 6320749.assert (Universe::verify_in_progress() || ! SafepointSynchronize::is_at_safepoint(),"invariant"); assert (Universe::verify_in_progress() || Self->is_Java_thread() ,"invariant") ;
assert (Universe::verify_in_progress() ||
((JavaThread *)Self)->thread_state() != _thread_blocked, "invariant"); ObjectMonitor* monitor =NULL;
markOop temp, test;
intptr_t hash;
markOop mark = ReadStableMark (obj);
// object should remain ineligible for biased lockingassert (! mark->has_bias_pattern(),"invariant");// Mark is the object header
if (mark->is_neutral()) {
hash = mark->hash(); // Retrieve the hash value.
if (hash) { // if it has hash, just return it
return hash;
}
hash = get_next_hash(Self, obj); // This is the core method for generating HashCode
temp = mark->copy_set_hash(hash); // merge the hash code into header
// use (machine word version) atomic operation to install the hash
test = (markOop) Atomic::cmpxchg_ptr(temp, obj->mark_addr(), mark);
if (test == mark) {
return hash;
}
// If atomic operation failed, we must inflate the header
// into heavy weight monitor. We could add more code here
// for fast path, but it does not worth the complexity.
} else if (mark->has_monitor()) {
monitor = mark->monitor();
temp = monitor->header();
assert (temp->is_neutral(), "invariant"); hash = temp->hash();if (hash) {
return hash;
}
// Skip to the following code to reduce code size
} else if (Self->is_lock_owned((address)mark->locker())) {
temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
assert (temp->is_neutral(), "invariant"); hash = temp->hash();// by current thread, check if the displaced
if (hash) { // header contains hash code
return hash;
}
// WARNING:
// The displaced header is strictly immutable.
// It can NOT be changed in ANY cases. So we have
// to inflate the header into heavyweight monitor
// even the current thread owns the lock. The reason
// is the BasicLock (stack slot) will be asynchronously
// read by other threads during the inflate() function.
// Any change to stack may not propagate to other threads
// correctly.
}
// Inflate the monitor to set hash code
monitor = ObjectSynchronizer::inflate(Self, obj);
// Load displaced header and check it has hash code
mark = monitor->header();
assert (mark->is_neutral(), "invariant"); hash = mark->hash();if (hash == 0) {
hash = get_next_hash(Self, obj);
temp = mark->copy_set_hash(hash); // merge hash code into header
assert (temp->is_neutral(), "invariant"); test = (markOop) Atomic::cmpxchg_ptr(temp, monitor, mark);if(test ! = mark) {// The only update to the header in the monitor (outside GC)
// is install the hash code. If someone add new usage of
// displaced header, please update this code
hash = test->hash();
assert (test->is_neutral(), "invariant") ;
assert (hash != 0."Trivial unexpected object/monitor header usage."); }}// We finally get the hash
return hash;
}
Copy the code
get_next_hash
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0 ;
// Get a random one
if (hashCode == 0) {
// This form uses an unguarded global Park-Miller RNG,
// so it's possible for two threads to race and generate the same RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random() ;
} else
// Calculate one based on the memory address
if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3 ;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
} else
/ / returns 1
if (hashCode == 2) {
value = 1 ; // for sensitivity testing
} else
// What do you mean by this
if (hashCode == 3) {
value = ++GVars.hcSequence ;
} else
// Return the memory address directly
if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj) ;
}
// This is an algorithm for generating random numbers
else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX ;
t ^= (t << 11); Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ;unsigned v = Self->_hashStateW ;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)); Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask;if (value == 0) value = 0xBAD; assert (value ! = markOopDesc::no_hash,"invariant"); TEVENT (hashCode: GENERATE) ;return value;
}
Copy the code
Here we basically know how the hashcode is generated, in jdk1.8 is actually in the fifth generation of way, we can under the Linux system input: Java – XX: + PrintFlagsFinal – version | grep hashcode command to see
Ok, let’s take a look at the fifth way. My first thought when I saw this code was “what the hell is _hashStateX?” Let’s see how thead.cpp defines it:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random() ;
_hashStateY = 842502087 ;
_hashStateZ = 0x8767 ; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509 ;
Copy the code
Thread defines a random number and three constants, which are used to generate hashcode according to the algorithm above.
For details, please refer to Xorshift RNGs
Because the above algorithm needs to obtain a random number in the thread as an initial value, the above algorithm has a causal relationship before and after, and the later result is calculated according to the previous result. Therefore, for the same thread, the generation of object hashcode is sequentially related in a sense.
Why is the generation of hashCode in the main function always in order? Because there are fixed values in the main thread.
You can see that hashCode in 1.8 is independent of the memory address, but is dependent on the thread (or random number generated) and the order in which it is generated