This article is participating in the “Java Theme Month – Java Brush questions punch card”, see the activity link for details
Why does HashMap perturb functions?
What are the problems with the simple HashMap I wrote last time? A HashMap is not strictly a hash data structure
-
All elements are stored in an index location, and if the location of the element is not enough for the hash collision, then the hash table is not useful, not as expected
-
All elements need to be stored in an index location, and if the location of the element is not enough for the hash collision, then the hash table is useless and the desired performance is not achieved.
-
In the calculation formula to obtain the index ID, the array length needs to be a multiple of 2, then how to initialize the array size.
-
The smaller the array, the bigger the collision, the bigger the array, the smaller the collision, how to choose between time and space. 4. There are 7 elements in the list, and there are 2 strings in the list.
-
As the elements are added, the array length is not enough to expand, how to split the original elements into new locations.
These problems can be summarized as; Perturbation function, initialization capacity, load factor, expansion method, use of linked list and red-black tree transformation, etc.
The content of this chapter is disturbance function!
What is the disturbance function?
When storing elements in a HashMap, there is code to handle hash values. This is the Hash perturbation function of Java 8, used to optimize the hash effect.
Version 1.7
final int hash(Object k) { int h = hashSeed; if (0 ! = h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }Copy the code
Version 1.8
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code
In version 1.8, the hash method is simplified. This may be because the discrete distribution is not improved significantly, or it is reduced for efficiency. We’re not going to go into the details here
Third, why use disturbance function?
Theoretically the string hashCode is an int, and that can be used directly as an array subscript, without collision, but the value of hashCode is in the range [-2147483648, 2147483647], and above 32 the compiler is -2^31 to 2^31-1, Java uses the binary complement mechanism, the first symbol bit, the maximum value is 0x7FFFFFFF, i.e. (2147483647) is also the maximum value of int
With a length of nearly 4 billion, no one can initialize an array that big, nor can it fit in memory.
DEFAULT_INITIAL_CAPACITY = 1 << 4. Therefore, the Hash value obtained cannot be used as a subscript directly. It needs to be modulo with the array length to obtain a subscript value, which is the example in the previous article
(h = key.hashCode()) ^ (h >>> 16). By moving the hash 16 bits to the right, which is exactly half its length, and then xor with the original hash, you mix the high and low values of the original hash, increasing the randomness. The calculation method is shown below.
In plain English, the purpose of using perturbation functions is to increase randomness, make data elements more evenly hashed, and reduce collisions.
4. Experimental verification of disturbance function
The perturbation function uses the high and low half of the hash value to do xOR, and mixes the high and low of the original hash code, so as to increase the randomness of the low region. But if you don’t see the experimental data, it’s still a theory, and I don’t know if this hash is actually randomised or not. So we’re going to do an experiment here, and this experiment goes like this;
-
Choose a 100,000-word thesaurus
-
Defines a 128-bit array lattice
-
Calculate the number of subscripts of 100,000 words allocated to 128 cells under perturbation and unperturbation respectively
-
The number of each grid is counted and the wave curve is generated. If the fluctuation curve under the perturbation function is relatively more stable, then the perturbation function is proved to be effective.
1. Disturbed code testing
Perturbation function contrast method
Public class Disturb {// Disturb hashidx Disturb function, Public static int disturbHashIdx(String key, int size) { return (size - 1) & (key.hashCode() ^ (key.hashCode() >>> 16)); } public static int hashIdx(String key, int size) {return (size-1) &key.hashCode (); }}Copy the code