This article will explain the implementation principle of the Java collection framework corresponding to the implementation of HashMap, and then will analyze JDK7 HashMap source code (JDK8 will be different, you can read JDK8 HashMap source code).
The general changes to HashMap in JDK7 and JDK8 (and this is also a commonly asked interview question) are:
1.7 uses array + linked list, 1.8 uses array + linked list/red-black tree, that is, in 1.7, when the linked list length exceeds a certain length, it is changed to red-black tree storage.
1.7 Recalculate the hash value and index position during capacity expansion. 1.8 Does not recalculate the hash value. Skillfully calculates the new index position using & operation after capacity expansion.
1.7 uses the header insertion method to insert the linked list, while 1.8 uses the tail insertion method.
In 1.7, the table header insertion method is adopted, which will change the original order of the elements in the linked list during expansion, resulting in the problem of the linked list being looped in concurrent scenarios. In 1.8, the tail insertion method is adopted, which will keep the original order of the linked list elements during expansion, so that the problem of the linked list loop will not occur.
directory
-
What is a hash table
-
Implementation principle of HashMap
-
Why does a HashMap have to be a power of 2?
-
Overwriting equals requires overwriting hashCode as well
-
conclusion
What is a hash table
Before we discuss hash tables, let’s take a look at the performance of other data structures for basic operations such as add and find
Array: A contiguous array of storage cells to store data. For the search of the specified subscript, the time complexity is O(1); Search through a given value, need to traverse the number group, one by one to compare the given keyword and array elements, the time complexity is O(n), of course, for ordered array, we can use binary search, interpolation search, Fibonacci search and other ways, can improve the search complexity to O(logn); For ordinary insert and delete operations, which involve moving array elements, the average complexity is also O(n).
Linear linked list: For the operation of adding and deleting a linked list (after finding the specified operation location), only the reference between nodes can be processed, the time complexity is O(1), while the search operation needs to traverse the list for comparison one by one, the complexity is O(n).
Binary tree: To insert, search, and delete a balanced ordered binary tree, the average complexity is O(logn).
Hash table: compared to the above data structures, the performance of the hash table to add, delete, search and other operations, without considering the hash conflict, only one positioning can be completed, the time complexity is O(1), next we will see how the hash table is to achieve amazing constant order O(1).
As we know, there are only two physical storage structures for data structures: Sequential storage structure and chain store structure (like a stack, queue, tree, graph to abstraction from the logical structure, such as map into memory, also the two physical form), on which we have mentioned, according to the index to find an element in the array, a positioning can be achieved, hash table using this feature, the backbone of the hash table is an array.
For example, if we want to add or find an element, we can map the key of the current element to a certain location in the array through a function, and locate the array subscript once to complete the operation.
Storage location = F (keyword)
Among them, the function f is generally called the hash function, the design of this function will directly affect the quality of the hash table. For example, if we want to perform an insert in a hash table:
In the same way, calculate the actual storage address through the hash function, and then extract the corresponding address from the array.
Hash conflict
However, nothing is perfect. What if two different elements hash to the same actual storage address? That is, when we hash an element, get a storage address, and then try to insert it, and it’s already occupied by another element, that’s what’s called a hash collision, also called a hash collision.
We mentioned above, the design of the hash function is crucial to a good hash function will be as much as possible to ensure calculation simple and uniform hash address distribution, however, we need to clear is that an array is a fixed length of continuous memory space, it is a good hash function also can’t guarantee a memory address is absolutely no conflict.
So how do hash conflicts get resolved? There are many solutions to hash conflicts: open addressing (if a conflict occurs, continue to look for the next block of storage address that is not occupied), hash function method, chain address method, and HashMap adopts the chain address method, that is, array + linked list method.
2. Implementation principle of HashMap
The backbone of a HashMap is an array of entries. Entry is the basic unit of a HashMap, and each Entry contains a key-value pair.
The initial value is an empty array {}. The length of the trunk array must be a power of 2. transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;Copy the code
Entry is a static inner class in a HashMap. The following code
Public class PaoPaixu {public static void sort(int[] data){int TMP;for (int i = 0; i < data.length; i++) {
for (int j = i+1; j < data.length; j++) {
if(data[i]>data[j]){ data[i]=data[i]+data[j]; data[j]=data[i]-data[j]; data[i]=data[i]-data[j]; }}}} public static void main(String[] args) {int data ={5,4,2,1,8,9,4,3}; sort(data);for(int i = 0; i < data.length; i++) { System.out.println(data[i]); }}}Copy the code
So, the overall structure of a HashMap is as follows
Simply put, HashMap consists of array + linked list. Array is the body of HashMap, while linked list is mainly used to solve hash conflicts. If the array location does not contain a linked list (the next of the current entry points to NULL), search, add and other operations will be very quick and only need one address. If the array to be located contains a linked list, the time complexity of the add operation is O(n), and the linked list is first traversed. If it exists, it will be overwritten; otherwise, it will be added. For lookups, you still need to traverse the linked list and then compare lookups one by one through the Equals method of the key object. Therefore, for performance purposes, the fewer linked lists in a HashMap, the better the performance.
A few other important fields
// Number of key-value pairs stored transient int size; // Threshold. When table == {}, this value is the initial capacity (default initial capacity is 16). When the table is filled, that is, after the memory space is allocated for the table, the threshold is generally capacity*loadFactory. When expanding HashMap, we need to refer to threshold, which will be discussed in detail later. // Load factor, which represents how full the table is. Default is 0.75 finalfloatloadFactor; // For quick failures, since the HashMap is not thread-safe, if the structure of the HashMap changes (such as put, remove, etc.) during the iteration of the HashMap due to participation of other threads, Need to throw an exception ConcurrentModificationException transient int modCount;Copy the code
HashMap has four constructors. Other constructors use the default value initialCapacity if the user does not pass in the initialCapacity and loadFactor parameters. InitialCapacity defaults to 16. Public HashMap(int initialCapacity,floatLoadFactor) {// The initial capacity passed in here is checked and cannot exceed MAXIMUM_CAPACITY = 1<<30(230)if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: "+ loadFactor); this.loadFactor = loadFactor; Threshold = initialCapacity; init(); // The init method is not actually implemented in HashMap, but it is implemented in subclasses such as linkedHashMap}Copy the code
As you can see from the above code, the table array is not allocated in the normal constructor (except for a constructor whose entry parameter is a specified Map). Instead, the TABLE array is actually built when the PUT operation is performed
OK, let’s look at the implementation of the put operation
Public V put(K key, V value) {// If the table array is an empty array {}, fill the array (allocate real memory space for the table) with the entry parameter threshold, The threshold is initialCapacity. The default is 1<<4(24=16).if(table == EMPTY_TABLE) { inflateTable(threshold); } // If the key is null, the storage location is on the conflicting chain of table[0] or table[0]if (key == null)
return putForNullKey(value);
int hash = hash(key); Int I = indexFor(int I = indexFor(hash, table.length); // Get the actual position in the tablefor(Entry<K,V> e = table[i]; e ! = null; E = e.ext) {// If the corresponding data already exists, perform overwrite operation. Replace the old value with the new value and return the old value Object k;if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
returnoldValue; } } modCount++; // If the internal structure of the HashMap changes during concurrent access, fast response failure addEntry(hash, key, value, i); // Add an entryreturn null;
}Copy the code
Let’s start with inflateTable
private void inflateTable(int toSize) { int capacity = roundUpToPowerOf2(toSize); // Capacity must be a power of 2. Threshold = (int) math. min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); // Set threshold to the minimum value of capacity*loadFactor and MAXIMUM_CAPACITY+1. Capaticy must not exceed MAXIMUM_CAPACITY. Table = new Entry[capacity] unless loadFactor is greater than 1; initHashSeedAsNeeded(capacity); }Copy the code
RoundUpToPowerOf2 (toSize) To ensure that capacity is greater than or equal to the nearest second power of toSize, For example, if toSize=13, capacity=16. to_size=16,capacity=16; to_size=17,capacity=32.
private static int roundUpToPowerOf2(int number) {
// assert number >= 0 : "number must be non-negative";
return number >= MAXIMUM_CAPACITY
? MAXIMUM_CAPACITY
: (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
}Copy the code
RoundUpToPowerOf2 makes the array a power of 2. Integer.highestOneBit is used to get the value of the leftmost bit (the other bits are 0).
The hash function
// This is a magic function that uses a lot of xor, shift, further calculation of the key's hashcode, and binary adjustments to ensure that the final int is as evenly distributed as possiblehash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}Copy the code
The value computed by the above hash function is further processed by indexFor to obtain the actual storage location
Static int indexFor(int h, int length) {static int indexFor(int h, int length) {return h & (length-1);
}Copy the code
H & (length-1) ensures that the index obtained is within the range of the array. For example, the default capacity is 16, length-1=15, h=18
1 0 0 1 0
& 0 1 1 1 1
__________________
0 0 0 1 0 = 2Copy the code
The final index is equal to 2. Some versions use modulo for this calculation, which also ensures that the index is in the range of the array, but bitwise operations are more efficient for computers (there are a lot of bitwise operations in a HashMap)
Therefore, the process of determining the final storage location is as follows:
Let’s look at the implementation of addEntry:
void addEntry(int hash, K key, V value, int bucketIndex) {
if((size >= threshold) && (null ! = table[bucketIndex])) { resize(2 * table.length); // Expand capacity when size exceeds threshold and hash conflicts are about to occurhash= (null ! = key) ?hash(key) : 0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}Copy the code
According to the above code, when hash conflicts occur and the size is larger than the threshold value, array expansion is required. During expansion, a new array with twice the length of the previous array is needed, and then all the elements in the current Entry array are transferred to the new array with twice the length of the previous array. So capacity expansion is a relatively resource-intensive operation.
Why must the array length of a HashMap be a power of 2?
Let’s continue with the resize method mentioned above
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return;
}
Entry[] newTable = new Entry[newCapacity];
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}Copy the code
If the array is expanded, the length of the array will change, and the storage location index = h&(Length-1),index may also change, and index needs to be recalculated. Let’s first look at the transfer method
void transfer(Entry[] newTable, Boolean rehash) {int newCapacity = newtable.length; //forThe code in the loop iterates through the list one by one, recalculates the index position, and copies the old array data into the new array (the array does not store the actual data, so it just copies the references).for (Entry<K,V> e : table) {
while(null ! = e) { Entry<K,V> next = e.next;if (rehash) { e.hash = null == e.key ? Zero:hash(e.key); } int I = indexFor(e.hash, newCapacity); NewTable [I] may be empty, or it may be an entry chain. If it is an entry chain, insert it directly into the head of the list. e.next = newTable[i]; newTable[i] = e; e = next; }Copy the code
In this method, the data in the old array is iterated through the linked list one by one and thrown into the new expanded array. We calculate the array index position by hash operation on the hashcode of the key value and bit operation with Leng-1 to get the final array index position.
The length of the hashMap array must be the power of 2. For example, if the binary representation of 16 is 10000, then the length of the array is 15 and the binary representation is 01111. Similarly, after the expansion, the length of the array is 32 and the binary representation is 100000, and the length of the array is 31 and the binary representation is 011111.
As can be seen from the figure below, all the low places are guaranteed to be 1, but after expansion, there is only one bit difference, that is, the left-most 1 is added. So when passing h&(Length-1), as long as the left-most difference bit corresponding to H is 0, The new array index can be guaranteed to be the same as the old array index (greatly reducing the transposition of the data in the old array that has been well hashed previously), personal understanding.
In addition, the length of the array is kept to the power of 2, and all the low values of length-1 are 1. This will make the index of the array more uniform, for example:
We see that the above & operation, will not affect the result of high (hash function is used in a variety of operations may be in order to make more low hash), we focus on low bit, if low all is 1, so for h low part, any changes will have an effect on the result, that is to say, To get index=21, there’s only one combination of low h. This is why the array length must be a power of two.
If it’s not a power of 2, if it’s not a power of 1, then the low order part of h is no longer unique in order to make index=21, the chance of hash collisions will be greater, and the index bit will not be equal to 1 anyway, and the array positions will be wasted.
The get method
Public V get(Object key) {// If the key is null, go to table[0].if (key == null)
return getForNullKey();
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}Copy the code
The get method returns the corresponding value from the key. If the key is null, it retrieves it directly from table[0]. Let’s look at the getEntry method again
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
returnnull; } // by the hashcode value of the keyhashValue of the inthash= (key == null) ? Zero:hash(key);
//indexFor (hash&length-1) retrieves the final index of the array, then iterates through the linked list to find the corresponding records through equalsfor (Entry<K,V> e = table[indexFor(hash, table.length)]; e ! = null; e = e.next) { Object k;if (e.hash == hash&& ((k = e.key) == key || (key ! = null && key.equals(k))))return e;
}
return null;
}Copy the code
Key (hashcode)–>hash–>indexFor–> final index position (table[I]); check whether there is a linked list (table[I]); Note that e.hash == hash is not necessary when traversing the list after locating the array position, just using equals.
This is not the case. If you pass a key object that overrides equals but doesn’t override hashCode, and it happens to be located in this array, it might be equal if you just use equals but the hashCode doesn’t match the current object, According to the convention of Object hashCode, the current Object cannot be returned. Instead, null should be returned, as the following example will explain.
If you want to override equals, you need to override hashCode
If you override hashcode, you can override equals. If you override hashcode, you can override hashcode. If you override hashcode, you can override equals
/**
* Created by chengxiao on 2016/11/15.
*/
public class MyTest {
private static class Person{
int idCard;
String name;
public Person(int idCard, String name) {
this.idCard = idCard;
this.name = name;
}
@Override
public Boolean equals(Object o) {
if (this == o) {
return true;
}
if(o == null || getClass() ! = o.getClass()){return false; } Person person = (Person) o; // The value of two objects is determined by idCardreturn this.idCard == person.idCard;
}
}
public static void main(String []args){
HashMap<Person,String> map = new HashMap<Person, String>();
Person person = new Person(1234,"Xiao feng."); Map. put(person,"Heavenly Dragon eight Bu"); System.out.println(system.out.println (system.out.println))"The results."+map.get(new Person(1234,"SAO feng"))); }}Copy the code
Actual output:
Results: the nullCopy the code
This result is not hard to understand if you already know a little bit about how HashMap works. Although the keys we use for get and PUT are logically equivalent (equals by comparison), we have not overridden the hashCode method, so when we do put, Key (hashcode1)–>hash–>indexFor–> final index; key(hashcode1)–>hash–> final index Since hashcode1 is not equal to Hashcode2, the logically-incorrect value NULL is returned for not locating an array location. (It is also possible to locate an array location by accident, but the hash value of the entry is determined to be equal, as mentioned in the get method above.)
So, when overriding equals, care must be taken to override the hashCode method while ensuring that two objects are equal to each other through equals and that calling hashCode returns the same integer value. If equals determines that two objects are not equal, the hashCode can be the same (except for hash collisions, which should be avoided).
Five, the summary
This article describes the implementation principle of HashMap and further analyzes the source code. It also covers some source code details and design reasons. Finally, it briefly explains why hashCode should be overridden when overriding equals. Hope this article can help you, and also welcome to discuss correction, thanks for your support!
Source: cnblogs.com/chengxiao/p/6059914.htm
Thank you for reading the whole article. For more technical articles and interview questions of big factory, you can follow the wechat public number: Java Programmers Gathering Place.