preface
A few days ago, I talked with my friend and elicited an article “GC copies live objects, does its memory address change?” We learned that in the Hotspot virtual machine, the address of the object changes when GC occurs.
We also know that whenever we call the hashCode method of the same object, we must return the same value, which means that the hashCode value of an object remains constant throughout its lifetime. Meanwhile, netpass “hashCode is generated based on object address”. So, how does hashCode stay the same when the address of the object changes?
The provisions of the HashCode
Before we continue, let’s look at some conventions and instructions for the hashCode method.
There are three conventions for hashCode methods on JavaDoc annotations for java.lang.Object, which can be summarized as follows:
First, the value of multiple calls to the hashCode method should remain the same when the fields used by the equals method of an object are unchanged.
Second, if two objects are equal to their equals(Object O) methods, the hashCode method values must be equal.
Third, if two objects are not equal to their equals(Object O) methods, then the hashCode values are not required to be equal, but in this case try to ensure that the Hashcodes are different to improve performance.
At the same time, there is a description in the annotation section:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, But this implementation technique is not required by the Java™ Programming language.
As we know from the above description, hashCode is normally mapped from the memory address of the object. This is where the hashCode is generated based on the address of the object.
However, we know that the memory address of the object changes when the JVM performs GC, whether it is a tag copy algorithm or a tag collation algorithm. But HashCode requires it to remain the same. How exactly does the JVM do this? Another question is, if an object is moved to another location and its original location is filled by another object, will the newly populated object be the same as the previous object hashCode?
Let’s explore the implementation of the JVM.
HashCode values before and after GC
Let’s start with an example to verify the changes in object addresses and hashcode values before and after GC. Introducing JOL dependencies in a project:
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.10</version>
</dependency>
Copy the code
The verification code is as follows:
public class TestHashCode { public static void main(String[] args) { Object obj = new Object(); long address = VM.current().addressOf(obj); long hashCode = obj.hashCode(); System.out.println("before GC : The memory address is " + address); System.out.println("before GC : The hash code is " + hashCode); new Object(); new Object(); new Object(); System.gc(); long afterAddress = VM.current().addressOf(obj); long afterHashCode = obj.hashCode(); System.out.println("after GC : The memory address is " + afterAddress); System.out.println("after GC : The hash code is " + afterHashCode); System.out.println("---------------------"); System.out.println("memory address = " + (address == afterAddress)); System.out.println("hash code = " + (hashCode == afterHashCode)); }}Copy the code
If GC does not occur, the JVM parameters can be set to a smaller value, such as 16M: -xMS16m -XMx16m -xx :+PrintGCDetails.
Run the above code, and the following log is displayed:
before GC : The memory address is 31856020608
before GC : The hash code is 2065530879
after GC : The memory address is 28991167544
after GC : The hash code is 2065530879
---------------------
memory address = false
hash code = true
Copy the code
As you can see from the console information above, the address of the object does change before and after GC, but hashCode does not. It can also be seen that the value of hashcode is completely different from the value of the memory address. According to the annotation of the Hashcode method, we can only assume that the hashcode value is related to the address.
The principle of hashCode immutable
From the above examples, it is clear that the original HashCode value is stored somewhere for reuse during GC operations. In the case of Hotspot, the most straightforward way to do this is to partition a portion of the object’s header area (25 bits on 32-bit machines, 31 on 64-bit machines) to store the hashCode value. But this adds extra information to the object, and in most cases the hashCode method is not called, resulting in a waste of space.
So how is the JVM optimized? When the hashCode method is not called, the location in the object header used to store hashCode is 0, and the corresponding hashCode value is computed only when the hashCode method (essentially System#identityHashCode) is called for the first time, And store it in the Object header. When called again, it simply gets the calculated HashCode.
The above implementation ensures that the value of HashCode will not be affected even if GC occurs and the object address changes. For example, if the hashCode method is called before GC occurs, the hashCode value is already stored, even if the address changes; This is especially true when calling the hashCode method after GC occurs.
The way hashCode is generated
Different JVMS generate hashCode values differently. The Open JDK provides six methods for generating hash values.
- 0: A random number generated.
- 1: A function of memory address of the object.
- 2: Hardcoded 1 (used for sensitivity testing.);
- 3: pass A sequence.
- 4: the memory address of the object, cast to int. The memory address of The object, cast to int.
- 5: Thread state combined with xorshift;
In OpenJDK6 and openJDk7, the random number generator (0) method is used, and OpenJDK8 and 9 use the 5th method as the default generation method. So, purely from the implementation of the OpenJDK, the generation of HashCode has nothing to do with the object’s memory address. Annotations on the hashCode method of the Object class are most likely from earlier versions of the fourth method.
HashCode and identityHashCode
We’ve mentioned the hashCode method several times above, as well as the identityHashCode method, which is identical to the identityHashCode method provided in the System class if you just look at the hashCode method of the Object class. In practice, however, we often override the hashCode method, and the hashCode value stored in the Object header can be one of the parent Object class and one of the implementation class.
In the OpenJDK, the header stores the hashcode obtained by System#identityHashCode, while the hashcode of a subclass of the overridden hashcode method is obtained by calling its implementation method in real time.
So, if a class’s hashCode method is overridden, can you still get the original hashCode? The System#identityHashCode method can return an immutable hascode value, regardless of whether the current object overrides the hashCode method.
To verify this, create a Person class that implements the hashCode method:
public class Person { private int id; @override public int hashCode() {return objects.hash (id); }}Copy the code
The verification code is as follows:
Person person = new Person();
person.setId(1);
System.out.println("Hashcode = " + person.hashCode());
System.out.println("Identity Hashcode = " + System.identityHashCode(person));
Copy the code
Execute the verification program and print the following result:
Hashcode = 32
Identity Hashcode = 1259475182
Copy the code
You’ll notice that the System#identityHashCode method uses the default hashCode method, not the hashCode method overridden by the Person object. Essentially the Object hashCode method is also the called identityHashCode method.
One more question
If the JVM generates hashcode values based on the object’s memory address, does the problem arise that if Object1 is called with the Hashcode method, then GC moves. Object2 is assigned to the same location as Object1. Object2 also calls the hashCode method. Yes, they might be equal, but that doesn’t matter, a hashcode value is just a hashcode value, it doesn’t have to be unique. When a hash conflict occurs, the same value will appear.
So let’s verify that
Hashcode = hashCode = hashcode = hashcode = hashcode We still use the JOL dependency library to write a program that looks at how the Object header information changes after the hashCode method is called.
Object person = new Object(); System.out.println(ClassLayout.parseInstance(person).toPrintable()); // Call the hashCode method, or System#identityHashCode method system.out.println (person.hashcode ()) if the hashCode method is overridden; // System.out.println(System.identityHashCode(person)); / / print the information in the object the JVM again System. Out. The println (ClassLayout. ParseInstance (person). ToPrintable ());Copy the code
After executing the above procedure, the console prints the following:
java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total 1898220577 java.lang.Object object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 21 8c 24 (00000001 00100001 10001100 00100100) (613163265) 4 4 (object header) 71 00 00 00 (01110001 00000000 00000000 00000000) (113) 8 4 (object header) e5 01 00 f8 (11100101 00000001 00000000 11111000) (-134217243) 12 4 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes totalCopy the code
Before and after calling the hashCode method, we can see that the Value of the row with OFFSET 0 is changed from 1 to 613163265, which means that the hashCode Value is stored. If the corresponding method is not called, no storage takes place.
summary
After analyzing this article, we can see that the problem of GC moving objects causing hashCode changes is not a problem when the JVM generates hashCode without using the object’s memory address. But in exploring this question, we learned about hashCode generation, storage, and its relationship to the identityHashCode method, and we learned even more by practicing new uses of JOL.
Program new horizon
\
The public account “program new vision”, a platform for simultaneous improvement of soft power and hard technology, provides massive information