Source code parsing | the first time someone said HashMap so clear ~

background

Push yourself to research source reason mainly has two, first is senior engineer of the interview, the source code, this is will ask the second reason is that framework is more and more now, also don’t have much energy to study, then ready to study various underlying knowledge, look at the underlying bosses is how to write code, this is the first step to step out, There will be more and more source code learning experience to share with you, I hope you can put forward valuable comments. So without further ado, let’s get straight to our topic today

The development environment

The development tools	JDK version
IDEA 2020	JDK1.8
## Throw out the brick to attract the jade
How many classic interview questions about HashMap 1.8?

Why is the initial capacity of a HashMap a power of two?
When will HashMap be expanded?
How does HashMap scale up?
HashMap underlying data structure?
Why did HashMap1.8 introduce red-black trees?
When does HashMap convert a linked list to a red-black tree?
What happens when HashMap is multithreaded?
What about the hash algorithm for HashMap?
How does a HashMap locate a key in an array?

With that said, I’m sure you’ve all been asked these questions before. Isn’t it surprising that a hashmap can produce so many interview questions? In this article, I will take you through the interpretation of hashmap. After reading the above interview questions, you will know how to answer them. Let’s get started

I want to talk to you about one of the data structures in hashMap before I get into the code

First of all, the underlying hashmap is a data structure, so why use an array, because it looks very fast, so it looks like this in the beginning, its initial length is 16, right

2. Then I insert a key, how does it calculate its position? By computing its hash code, it gets an integer, modulo 16, and hashes the data to positions 0-15, but the JDK uses a much more powerful method to calculate this position, as I’ll explain later. You’ll think the algorithm smells good. 3. When more and more data is stored, it is found that my location is occupied, so what can I do? I can’t overwrite it, and then I have a new member of the linked listThat is to say, when the position is the same, all the data will be in the form of a linked list, and then continue down at that position, forming the above form 5. Once we add a list, our data storage problem is solved, but as the list gets longer and longer, we struggle to find it, because we know that list structures can be added and deleted quickly, but the complexity of searching is O (n), traversing one by one. So it is necessary to introduce a new member, red black tree 6. Red black tree is an implementation of balanced binary tree, the data structure will be explained in a subsequent article, so let’s look at the introduction of red black tree, what is the combinationOk, I have talked about the hashMap structure with you. Now let’s analyze the code step by step to solve our doubts.

Code sample

A simple main method, and then following this method, we switch to the map world

public class Main {
    public static void main(String[] args) {
	// write your code here
        Map<String,String> map = new HashMap<>(27);
        map.put("name"."Here we go."); }}Copy the code

Let’s go to the new HashMap and see what the constructor does for us

    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
Copy the code

You can see that he calls another constructor, and another constructor DEFAULT_LOAD_FACTOR, which is a load factor 0.75

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

So let’s divide up this code,

Determine if the initial capacity we set is valid
Check whether the initial capacity is greater than the maximum capacity MAXIMUM_CAPACITY = 1 << 30. It is generally not greater than this maximum value, and if it is, this maximum value is used as the initial capacity
Determine whether the load factor is valid or not, and we’ll talk about what the load factor is used for later in the expansion
TableSizeFor is used to compute the least second power greater than or equal to this initial capacity, so 15 to the least second power is 16, and 7 to the least second power is 8, and let’s see how that works, if you’re interested you can run this code and see if that works, so the initial capacity of a map is always going to be a power of 2, It doesn’t have to be the number we set

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

So once we’ve done the constructor analysis, let’s go back to where we started, put, and let’s click in and see, this is the core of what we’re going to talk about this time, so take a look

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
Copy the code

We see that the put method is going to call putVal to perform the put logic, so we’re not going to follow that, but we’re going to see that we’re going to hash the key, and we’re going to look at interview question 8, and we’re going to look at that, and we’re going to click in and see what this hash does.

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

You’re not going to be disappointed, but there’s only three lines of code in there, so what can you do? Though only three lines of code, but it decreases the chance of a hash collision, what is called a hash collision, is mentioned when we first started some data to get the same subscript, and then is stored in the form of a list, which can lead to a linked list is too long, here to make a hash of more uniform, and take some measures, we have to analyze the code

If key is null, hash 0 is returned
If you hashcode key, you get a bunch of integers,
We know that it takes four bytes, it takes 32 bits, we take the first 16 bits as the high bits, the last 16 bits as the low bits, and then we shift the 32 bits 16 to the right, and we get the high bits and the low bits, and we get a new binary, why do we do that, This allows both the high and low hash values to be involved in the calculation of element subscripts, reducing the probability of collisions
Return the new HashCode

Once hashcode is done, let’s go back to our previous method putVal

   final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

Array initialization

This method is very long, you need to carefully read slowly, let’s start from the first line of parsing

First we define a Node Node and an array of nodes
And then check if the table is empty or if the array length is zero, and what is this table, which is the table that holds the array of nodes that you created, which is empty if you put it the first time, and has something in it the second time

7. If this is our first entry, it will resize us, that is, initialize a length of Node array. By the way, this resize method is also used when arrays are expanded

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null)?0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! =null) {
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

This code is really long… But that’s ok, we’re here to find out. Step by step, we’ll take a look. First copy the Node array before resize to oldTab, saving the old array length and the old threshold (load factor * array length).

Then check whether the old array capacity is greater than 0 (this condition is used for capacity expansion), if it is greater than 0 then check whether it is greater than the maximum capacity MAXIMUM_CAPACITY, if it is greater, the threshold is assigned to this maximum capacity, return oldtab, if the old array capacity does not exceed this maximum capacity, return oldtab. In this case, the threshold is expanded twice
If the old threshold is greater than 0, use the old threshold as the size of the new array. Instead of setting an initial capacity and converting it to a second power, we use that value to initialize a Node array

10. If the old threshold is equal to 0, the map’s default initial capacity of 16 and load factor of 0.75 are used to calculate the array capacity and threshold size.

Why do I end up saying newThr == 0, because if newThr == 0, it must be the first time I initialized the array, and I didn’t calculate this threshold, so I have to recalculate it.
So far, the new array has been expanded or initialized
Check again if (oldTab! = null), if the old array is not empty, it indicates expansion. If the old array is empty, it can directly return the array. If the old array is not empty, it needs to perform data migration after expansion

Data migration

Iterate over all the elements in the old array
Determine whether the current element is empty and migrate data if it is not
The following three judgments are used to determine whether it is a single node, whether it is a red-black tree node, and whether it is a linked list
If it’s a single node, the hash of e and the new capacity of -1 will give you the index position of the element in the new array. Remember we talked about how the JDK used a very clever method of locating elements, and here’s how it works. If we have a number, We wanted it to hash between 0 and 15, and we came up with the idea of modulo %16. Here’s another way to do it: if it’s 19, if you modulo 16, it’s 3. If you do 19& (16-1), it’s also 3. You will not be confused with this is why, I in this you briefly introduced:

For this to work, the value must be 2 to the power of n minus 1. All bits must be 1. That’s why map requires the array size to be a power of 2. And then we get a value like 1111 and we hash it with our hash

This operation was 3 11111000110011 & 1111 111111111111111 & 1111 after this operation was 15 will never be more than 15, you should know that this original > manage

Then judge if it is a red black tree, then red black tree related operations
Finally judge if it is a linked list, then traversal migration
A major operation here is to equalize elements with the same index
The element of the array length of hash and old and operation, if it is, his high is 1, and then with the new array length, or the result of the original, if not zero, you can directly to the index subscript plus the old array length, then on the nodes corresponding to the new array index below, here we could very meng, What does it mean to say so much? Let me draw a picture and tell you

We see that there will be two tables, one is the array before expansion and one is the array after expansion

On the seventh index of the oldTab array, the elements hash to 7 and 15, 7 & (8-1) and 15 & (8-1) both hash to 7, so it’s stored above 7.
Now, if we expand the array to 16, if we take the two hash values and the new array length, we will get 7 and 15 positions, which will split the list evenly between the two positions
But we look at the code, the JDK has not done so, he first determine hash and the original array length and the operation, and has been a minus 1 and array length do before operation, if the result is 0, that he was in the position of the new array index above or as the current is directly put data on the new array, if not zero, You just add the current index position to the length of the old array, because the array is twice the size of the old array,

So with that said, we’re done with array initialization and let’s go back to the code

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

So after the first judgment is processed, we move on,

P = TAB [I = (n-1) & hash]) == null If the index position of the array is empty, create a new Node and place it on the element
If it is not empty, continue down
If the current node is not empty and its key value is the same as the key we passed in, we will fetch the node and operate directly on the node
If the node is a red-black tree node, red-black tree related operations are performed
If none of the above conditions are met, it indicates a linked list structure.
Iterate over the elements in the list
If a node with the same key value is found in the linked list, the node is directly fetched and no longer iterated
If you haven’t found the last node in the list, you need to create a new node
So there are two cases, we can use binCount, what does this variable do, and we’re going to see that we’re going to increment this every time we do a list node, which is essentially the length of the list
If the length of the list is greater than the treeify_threshold-1 defined in the map, that is, 7, the list will be converted to a red-black tree, and the data will be stored in the red-black tree. When does a linked list become a red-black tree? The map defines 8, minus 1. This is because before traversing the list, we have already fetched the first list element above the array. The next traversal is based on the next element, so we need to use treeIFy_threshold-1 as the conversion condition.
Finally, let’s look at this code

if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
Copy the code

If e is not empty, it already exists in the map, so you just assign it a new value, and then you return the old value, and then you increment the length of the array by one. If it’s an update operation, you don’t get to this point. If the length of elements in the current array is greater than the threshold after adding one, expand the array.

The most important operation in map is put, get and remove. You can follow this idea to look at it. The interview questions at the beginning are also explained in the article. In the article, there are inaccurate places, I hope you will correct

Source code parsing | the first time someone said HashMap so clear ~

background

The development environment

I want to talk to you about one of the data structures in hashMap before I get into the code

Code sample

Related Posts

RTMP broadcast address of domestic and local TV stations

Stop telling me you don’t know how to write simple apps in Flask!

Install Elasticsearch from archive on Linux or MacOS