preface

Opening diagram

The main points of

  1. Java8 makes changes to the HashMap of Java7, the biggest difference being the use of red-black trees.

  2. In The Java7 structure, when looking for data, we quickly locate the index of the array based on the hash value. The speed of the query depends on the length of the linked list, and the time complexity is O(n).

  3. To reduce the problems in 2, in Java8, lists are converted to red-black trees when the number of lists is greater than 8. So when you look up data in a red-black tree, the time complexity changes order logN.

structure

chart

describe

  1. Arrays hold nodes.

  2. Node if it’s a linked list, TreeNode if it’s a red-black tree.

  3. In Node, keyt, Value, Hash, and next are the same as in Java7.

  4. We determine whether it is a red-black tree or a linked list based on the type of nodes stored in the array

A constructor

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }
Copy the code

This constructor initializes the threshold and load factor.

In the constructor, we do not specify the size of the HashMap. Even if we use the HashMap(int initCapacity) constructor, the result of the number operation is only the initialization threshold, and the internal array is not immediately constructed. The initialization threshold is used to facilitate initialization of arrays during subsequent PUT. The details are left to the reader below, but the inner array is not initialized at constructor execution.

tableSizeFor

The function simply takes the incoming number up to the nearest power of 2 (including equal). For example, if 15 is passed in, 16 is returned; Pass in 18, return 32. Pass in 32, return 32. Let’s see how he does it!

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

explain

Before explaining the source code, let me say that passing in a cap, which may not be a power of 2, is to find the smallest power of 2 greater than or equal to cap

How do I find it? Let’s look at the example we started with (18-1 first, and we’ll talk about this -1 later).

0000000000000000, 00000000, 0001 0001 Decimal system: 17

00000000000000000000, 00000000, 0010 0000 Decimal: 32

Now let’s see what 32 minus 1 is, compared to 32 and 17

00000000000000000000, 00000000, 0010 0000 Decimal: 32

0000000000000000, 00000000, 0001 1111 Decimal system: 31

0000000000000000, 00000000, 0001 0001 Decimal system: 17

We can see that if we change the last five bits of 17 to 1, it becomes binary 31, and if we add 1 to it, it becomes binary 32, and that’s what we want! In short, the purpose of this function is to start from the left 1 and go all the way to the right, and then +1 will do what we want!

process

n|=n>>>1

Since n is greater than 0, there must be a 1 at the top of the binary, so if you move an unsigned 1 to the right, you must be closer to the original number and the next number becomes 1. Such as 10 XXXX, then the n | = n > > > 1, must be 11 XXXX.

n|=n>>>2

In the example above, n has changed from 10xxxx to 11xxxx. So we’re going to make the following XXXX continue to be 1, so we’re going to make the first two digits of XXXX continue to be 1, so we’re going to make the first two digits of XXXX continue to be equal to 11. So if you move two unsigned Spaces to the right and align with yourself, you go from 11xxxx to 1111xx.

So now we have four ones, so we’re going to move by four. If you change it to eight ones, you move it eight bits….

And so on.

To change the 32 bits to all 1s, change the first 16 bits to all 1s, and then move the next 16 bits to the right to make all 32 bits to all 1s.

MAXIMUM_CAPACITY(1<<30); MAXIMUM_CAPACITY(1<<30);

Let me show you a picture, maybe it will be clearer. In fact, it is very clear!

summary

So here’s why you have to subtract 1 when it comes in. As we all know,

You should understand the above explanations and examples. If you go from 1 to the left of binary to the right, it’s going to be 1. If you pass in exactly two to the m, then the binary of n will be one plus m ones, and when you return plus one, it will be one plus m plus one zero. So it’s twice the number that came in. For example, if you pass in 32, the binary number is 100000. the 1 is followed by five zeros. If n is changed to 111111 and +1 is returned, it returns 1000000, which is 64 in decimal. This is clearly against our logic.

And anything else that’s not a power of two, whether you subtract 1 or not, as long as the 1 on the left doesn’t change, whatever’s on the right is going to be greater than or equal to the smallest power of two that’s passed in.

To sum up, the cap-1 passed in is to prevent the cap passed in from being an exact power of 2, so as to avoid doubling when it is returned later.

Put

public V put(K key, V value) {
    // For hash functions, we'll see later
    return putVal(hash(key), key, value, false.true);
}

//---------putVal

//onlyIfAbsent True indicates that the key does not exist and the key does not exist
// false: the key exists and is stored, but overwrites and returns the old value
 final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
     	// The first put executes the resize() in the following if
     	// The first resize is the equivalent of initialization, usually set to 16.
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
     	// For the first expansion, (n-1) &hash is the same as modulo n of hash
     	// If you take the hash value modulo 15, you can get a random index
     	// If there is no value for this position, then simply initialize Node and place it where the hash is mapped
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
     	// The hash mapped data already has nodes
        else {
            Node<K,V> e; K k;
            // If the first node is the one we are looking for, then e executes the first node
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            // If the first node is the root of a red-black tree, the red-black tree put operation is called
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            // At this point, the desired key may be placed after the first node or the key may not exist in the list
            else {
                // The binCount count is used to record whether eight nodes have been reached to transform into a red-black tree
                for (int binCount = 0; ; ++binCount) {
                    // If the last node is reached, the key does not exist
                    if ((e = p.next) == null) {
                        // The new key node is placed after the last node in the list, and e is already null
                        p.next = newNode(hash, key, value, null);
                        // If the number of nodes reaches 7, then the number of nodes becomes 8, and then the tree is transformed into a red-black tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        // There are two cases: 1 is successfully added to the end of the list and the total number does not exceed 8, 2 is added to the end of the list and the total number reaches 8, then it becomes a red-black tree and exits the loop
                        break;
                    }
                    // If the key already exists in the list, exit the list and overwrite it later
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break;
                    // Here is the operation that has not been found, traversing the listp = e; }}// If e is not null, the put key is already in the list.
            // If it is null, it is not in the previous list and has been added to the end of the previous list
            if(e ! =null) { 
                V oldValue = e.value;
                //1 onlyIfAbsent, as mentioned earlier, is false to override the old value
                //2 or no value before
                //1 or 2 execute the following if
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                // This function is only used in LinkedHashMap, which is empty
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
     	// If the threshold is exceeded after the node is added, the capacity is expanded
        if (++size > threshold)
            resize();
     	// This function is only used in LinkedHashMap, which is empty
        afterNodeInsertion(evict);
        return null;
    }

Copy the code

The hash method

static final int hash(Object key) {
    int h;
    return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
Copy the code

In Java, a hash function is a native method defined in the Object class, so all objects inherit it.

public native int hashCode(a);
Copy the code

Because this is a native method, the exact implementation is uncertain, but as you can see from the function signature, the method maps any object to an integer value. Calling this method completes the mapping of Object -> int

We see this in the hash implementation

  1. If the key is null, the value is placed at the first position in the array.

  2. If the key is not null, then the key’s hashCode is moved 16 bits to the right and then xor with itself.

You may have some questions about point 2

That is, by making the high and low bits of HashCode xor, the high bits interfere with the low bits. The goal is to make the array subscripts of hashCode maps more even. The following paragraph is quoted from the forum anonymous user’s explanation, I think the explanation is very detailed

Author: Anonymous user

Link: www.zhihu.com/question/20…

Source: Zhihu

We create a HashMap with an entry array of a default size of 16. Now there is a pair of keys and values that needs to be stored in the hashmap. The hashcode of the key is 0ABC0000 (eight hexadecimal numbers, total 32 bits). If the hashcode is not processed by the hash function, This pair will later be stored at subscript 0 in the Entry array. Subscript = ABCD0000&(16-1) = 0. And then we’re going to store another pair whose key hashcode is 0DEF0000, and we’re going to get array subscript 0 again. As you can already see, this is a poorly implemented hash algorithm because the 1 bit of HashCode is concentrated in the first 16 bits, resulting in an array with a constant index of 0. Therefore, pairs with very different keys are stored in the same linked list, resulting in slower query in the future. For example, 0ABC0000 becomes A02188B, 0DEF0000 becomes D2AFC70, and their array subscripts are no longer uniform zeros. What a hash function does is it’s easy to draw a graph that affects the value of each place by the value of the other place.

In the source code we see h&(n-1) operation, in fact, this is the same as h % n. It is just a very big modulus of the time will affect the efficiency, but through the bit operation is a lot faster!

resize

  1. Resize () is used for initialization and array expansion of the HashMap.

  2. When you expand an array, it’s twice as big

  3. Migrate data

 final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
     	// oldTab must be null if it is initialized and not null if it is initialized
        int oldCap = (oldTab == null)?0 : oldTab.length;
     	// Record the previous threshold
        int oldThr = threshold;
     	// Define a new capacity and a new threshold
        int newCap, newThr = 0;
     	// The capacity is already available
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                // At this point, the capacity of the front has doubled
                // The threshold is also doubled
                newThr = oldThr << 1; // double threshold
        }
     	// Call new HashMap(initCapacity), put for the first time
        else if (oldThr > 0) 
            // If the capacity is specified, for example, initCapacity is 22, newCap is 32
            newCap = oldThr;
     	// Call new HashMap(), put for the first time
        else { 
            // The capacity is the capacity specified inside the default class, which is 16
            newCap = DEFAULT_INITIAL_CAPACITY;
            The default loading factor is 0.75, so the threshold is 16*0.75 = 12
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
     	// Call new HashMap(initCapacity) or
     	// New HashMap(initCapacity,loadFactor)
     	// Because both of the above constructors initialize loadFactor
     	// Initializes the new threshold based on the new capacity
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;
     
     
     	// Create a new array and assign it to table
        @SuppressWarnings({"rawtypes"."unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
     
        if(oldTab ! =null) {
            // Iterate over the array for data transfer
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                // Get the node of the corresponding array position, if null, there is no node
                // If not, transfer
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    
                    if (e.next == null)
                        // There is only one node in the corresponding position
                        // This node modulates the length of the new array
                        newTab[e.hash & (newCap - 1)] = e;
                    // In the case of a red-black tree
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    else {
                        // Two linked lists are defined here, lo and hi
                        // e is the first node of the array
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        // The following do-while is a list of the current position of the list, and then
                        // According to some rules, put the linked list nodes in different places in the expanded array
                        do {
                            // Get the next node of e
                            next = e.next;
                            // The following explanation may be a bit difficult to understand, but we will look at the following explanation and then look at here
                            //1 If the node has the same hash position, the lo list is used to store the node
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            //2 then the new location is stored in the hi list
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        // The lo list is not empty, so the old place of the array is the head of LO
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // If the hi list is not empty, put the hi header in the old position of the array
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

The following diagram illustrates the process of data transfer.

Capacity to explain

Now to explain the resize source code in question

(e.hash & oldCap) == 0.

n = (tab = resize()).length;

if ((p = tab[i = (n - 1) & hash]) == null)

We know from these two statements of put that we’re using the hash and the length of the array -1 to figure out where the node is mapped to.

A:

It’s pretty simple. N is the length of the array, and we all know that the length of the array in a HashMap must be m squared of 2. If the length of the array is 16, 16 is 2 to the fourth power. Then 16-1 = 15 is 4 all ones.

  1. Before the expansion, the hash value obtained by using the hash (key) is combined with (n-1) to obtain the position. In the above example, the lower four bits of the hash value are extracted and the result is A. (The result must be between 0 and 15)

  2. If the size of the array is doubled, then the length of the array is 32, and the binary value is 100,000, then the same hash value is used to get the position of (32-1). So if you take the bottom five bits of the hash value, you get b. (The result must be between 0 and 31)

2:

We know from 1 and 2 that the binary of B has one more bit than the binary of A, and the first four bits are the same. And in binary, it’s either a 0 or a 1.

  • If the extra 1 bit is 0, then b and A are the same. For example, if a is 1001 and B is 01001, then a and B are equal, that is, the position of the same node in the new array is the original position of the old array.
  • If the extra 1 bit is 1, then b is 2 to the m more than a. For example, if A is also 1001 in decimal, 9, b 11001, and 25 in decimal, then B is 2 to the fourth more than A, and this 2 to the fourth power is exactly the length of the original array oldCap. In other words, the location of the same node in the new array is the original location of the old array plus 2 to the fourth power (the length of the unexpanded array (oldCap)).

So we conclude that as long as we can determine whether the bit where B has more than A is 1 or 0(in this case, bit 5), we can figure out where the same node is in the new array, in a nutshell. After the array is expanded, the same node is either in its original location or added to its original location by the length of the unexpanded array (oldCap).

So how do we figure out what is the digit where B is more than A? It’s easy to hash oldCap with oldCap. For example, if oldCap is 16, then binary is 1 followed by m zeros, i.e. 10000, that is, the m+1 bit is 1. Using the hash value and oldCap, we can find the 5th bit of the hash value, and then we can determine the node’s position in the new array according to the previous “two”.

Readers see here, can continue to go back to the source of 1 to see, this time should suddenly see!

Capacity summary

During capacity expansion, nodes in the original table will be re-hash to the new table, but the positions of nodes in the new table are related to each other: either the subscripts are the same or the oldCap is different (the size of the original table).

conclusion

  1. This article mainly explains the HashMap of some important methods of source code parsing, but also let oneself have a further in-depth understanding of HashMap.
  2. Since HashMap is a thread-safe class, it can be used if it is thread-safeCocurrentHashMaporCollections.synchronizedMapEncapsulate the HashMap object as thread-safe.
  3. The key value of a HashMap is allowed to be null, and it puts the element in the first position of the array it maintains.
  4. HashMap introduced red-black trees after Java8, which were not there before 1.8. They were simply linked lists, so the 1.8 version is more flexible.
  5. Through the analysis of the source code to understand a variety of bit operations, but also strengthen their own basic knowledge.

reference

  • Blog.csdn.net/fan2012huan…
  • www.javadoop.com/post/hashma…