This article has participated in the third “Topic writing” track of the Nuggets Creators Training Camp, check out the details:Digg project | Creator Boot Camp phase 3 is underway, “write” to make a personal impact.
π preface
With fireworks to make a living with poetry to love
Whether I go to record or not, in the past and future infinite years, there will still be someone to rise, someone to sink, someone to become a hero, someone to play a clown, someone to stand up, someone lost to sink. But you only need — “self-discipline in prosperity, self-healing in abjuration, do not abandon conscience on the way to make a living, do not lose on the way to love the most strict, this time I stand in the wind, let you fog everywhere.”
π HashMap
Design principle of
1. Design idea of HashMap:Copy the code
- A Map is a container that stores data in key-value pairs
HashMap
I’m using the keyKey
ηhashcode
Organize storage by value, making it very fast and efficient to work by key valuekey
Access to data. - For key-value pairs,
HashMap
Internally it will be encapsulated into a correspondingEntry
Objects, that is,Entry
Object is an organization of key-value pairs; - For each object,
JVM
I’m going to generate one for ithashcode
Value.HashMap
Storing key-value pairsEntry
Will be based onKey
ηhashcode
Value, in some sort of mapping, determines how the key-value pair should be pairedEntry
Stored in theHashMap
At what position in; - When using
Key
Value when taking data, and then according toKey
The value of thehashcode
εInternal mapping condition
, directly locate toKey
The correspondingValue
Where can values be stored so efficientlyValue
Value.
π HashMap
Design of hash function
The hash function takes the hashcode of the key, which is a 32-bit int, and then xor the high and low 16 bits of the Hashcode
Also known as the perturbation function, and there are two reasons for this design
- Be sure to keep it as low as possible
hash
Collision, the more scattered the better; - The algorithm has to be as efficient as possible, because it’s high-frequency, so it’s bitwise;
π What is Hash?
A Hash is a Hash algorithm that transforms an input of arbitrary length into an output of fixed length, which is the Hash value. Ok, this definition might be a bit abstract. Let's explain it sentence by sentence with the example of plan TWO mentioned above.Copy the code
-
For example, a person’s name is an arbitrary length of input. Z – zhang SAN 11; L- Li Si 22; W – fifty and 33
-
And then, go straight to the Z category and find it. But if there are a lot of people whose names start with Z, such as Z-Zhao Liu44, Z-Zhou Qi55, when I save more phones, I also need to find several times. I do a sorting by the uppercase of the first letter of the name in a way we can think of as a hash algorithm.
-
Finally, I divided the book into initials and recorded phone numbers — fixed-length output, meaning I had all the phones in 26 categories. So these 26 letters are hashes.
π± π HashMap
Data structures for JDK 1.8
This article is aboutJDK1.8
Version, internal use array + linked list red black tree
-
in
Jdk1.8
δΈHashMap
Some changes have been made to the implementation of the, but the basic idea remains the same, just in a few placesTo optimize the
Here’s a look at some of the changes:Data structures are stored by array + linked list
Change toArray + linked list + red-black tree storage
, when the list length exceedsThreshold (8)
When willLists are converted to red-black trees
. Performance has been further improved. -
Array: The array storage interval is continuous, occupying a large amount of memory, so the space complexity is very large. But the binary search time complexity of array is O(1). The characteristics of arrays are: easy to address, insert and delete difficult; Space complexity: refers to the memory space required to execute the algorithm. Time complexity: The computational effort required to execute the algorithm
-
Linked list: Linked list storage interval is discrete, occupies relatively loose memory, so the space complexity is small, but the time complexity is large, up to O (N). Linked lists are difficult to address and easy to insert and delete.
-
Red and black tree:
π JDK1.8
ε―Ή HashMap
What are the major optimizations?
-
Array + list changed to array + list or red-black tree;
-
The insertion method of the linked list is changed from ab initio insertion method to tail insertion method. Simply speaking, when inserting, if there are already elements in the array position, 1.7 place the new element into the array, and the original node is the successor node of the new node. 1.8 traverse the linked list and place the element to the end of the linked list.
-
For expansion, 1.7 needs to rehash the elements in the original array to the new location. 1.8 adopts simpler logical judgment, with unchanged position or index + old capacity.
-
1.7 Determine whether to expand the capacity and then insert the disk. 1.8 Determine whether to expand the capacity after the disk is inserted.
β¨ why do these points of optimization
-
Reduce the time complexity from O(n) to O(logn) to prevent hash conflicts.
-
Because in the expansion of 1.7 header insertion method, the header insertion method will reverse the linked list and generate rings in multi-threaded environment; Thread A is inserting node B, and thread B is inserting as well. When the capacity is insufficient, thread B begins to expand, hash again, place elements, adopt the head insertion method, and put the head of node B after traversing, thus forming A ring, as shown in the following figure:
The transfer code is called for expansion in 1.7, as shown below:
void transfer(Entry[] newTable, boolean rehash) { int newCapacity = newTable.length; for (Entry<K,V> e : table) { while(null ! = e) { Entry<K,V> next = e.next; if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } int i = indexFor(e.hash, newCapacity); e.next = newTable[i]; NewTable [I] = e; newTable[I] = e; e = next; }}}Copy the code
- Why does 1.8 not need to be reconfigured for expansion
hash
You can directly locate the original node in the new data position- This is because the expansion is twice the size of the original array, and the mask used to calculate the location of the array is only an extra 1. How to solve this problem?
- Before expansion, the length is 16, and the binary n-1 used to compute (n-1) & hash is 0000 1111. After expansion to 32, the binary is one higher, 0001 1111.
- Since it is an ampersand, 1 and any number ampersand is itself, there are two cases, as shown in the following figure: the case of the original hashcode where the 4th bit is 0 and the 4th bit is 1;
- The fourth bit is 0, the rehash value remains the same, and the fourth bit is 1, the rehash value is 16 larger than the original value.
π finally
-
For more references, see here:Chen Yongjia’s blog
-
Like the small partner of the blogger can add a concern, a thumbs-up oh, continue to update hey hey!