Probably the best source analysis of HashMap for beginners

This post was originally posted on my blog

preface

A HashMap uses a storage structure of key-value pairs. The bottom layer of the system is realized by array, chain address hashing and red-black tree. The HashMap source for this article is based on JDK1.8.

The source code parsing

variable

Buckets, which are often mentioned below, are actually any location in the table array

It doesn’t matter if you don’t understand the meaning of variables here, as you will see later in the method call

// Table initial capacity 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4;
// Maximum capacity
static final int MAXIMUM_CAPACITY = 1 << 30;
LoadFactor =0.75f if the constructor does not provide a loadFactor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
//table >8 nodes in a bucket to a red-black tree
static final int TREEIFY_THRESHOLD = 8;
// Restore the tree to a linked list when <=6 nodes in the bucket
static final int UNTREEIFY_THRESHOLD = 6;
// The table should be treed only when the table length exceeds 64. Otherwise, call resize() to expand the table even if the number of nodes in a bucket is greater than or equal to 8
static final int MIN_TREEIFY_CAPACITY = 64;
// The underlying array
transient Node<K,V>[] table;
/ / entrySet cache
transient Set<Map.Entry<K,V>> entrySet;
// The number of key-value pairs
transient int size;
// Total number of adjustments
transient int modCount;
// If the threshold exceeds this value, you need to expand capacity * load_factor
int threshold;
// Load factor
final float loadFactor;
Copy the code

A constructor

There are four constructors in total

    //1
	public HashMap(a) {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }
	// the constructor for the custom initial capacity calls constructor 3 with the default load factor
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
	//3. Customize the initial capacity and load factor
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // The threshold is used as the table capacity
        this.threshold = tableSizeFor(initialCapacity);
    }
	// build a HashMap based on a map's key-value pairs
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }
Copy the code

TableSizeFor (int cap) method

In constructor 3, threshold is computed by tableSizeFor. Take a look at this method

    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }
Copy the code

The function of this method is to find a minimum k such that 2^k>=cap. This operation causes that tableSizeFor will eventually process whatever initialCapacity is passed in the constructor into 2^ K form. This operation will be explained later.

Just a quick explanation of why tableSizeFor works. If cap=6, then n=6-1=5. As the binary n 101, after the unsigned right shift into 010, 101 | 010 = 111. The new n is 111, so every time you shift it, because you’re doing or, you’re going to get 111. The final return is n+1, which corresponds to the decimal 8, which is 2^3.

As can be seen, each shift or operation doubles the number of consecutive ones starting from the high binary of n. Since constructor 3 limits the maximum initialCapacity to MAXIMUM_CAPACITY=1 << 30, in the most extreme case, the final binary of N corresponds to 30 ones. And then finally, plus 1, it’s exactly 2 to the 30th.

So why do we set n=cap-1 when we take n in the first step? If we start with cap of 2 to the k, then if we don’t subtract 1 at the beginning, we end up with n equal to 2 to the k plus 1, which is obviously not what we want. Therefore, cap is first reduced by 1, and the binary number corresponding to N is k ones, which is not affected by shift or operation. And then finally, when I return, n plus 1 goes back to 2 to the k.

Put (K key, V value) method

The method of storing key-value pairs is actually implemented through the putVal method

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false.true);
    }
Copy the code

Hash (Object key) method

Calculate the hash value of key, because key.hashcode is an int value, (h = key.hashcode ()) ^ (h >>> 16) equals xor of the high and low 16 bits, making the distribution more loose. The specific reason can see zhihu’s highly praised answer.

    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code

PutVal method

Before introducing putVal, let’s take a look at a diagram to get a feel for how hashMap stores data. It can be seen that when the number of nodes in a bucket in the table exceeds the critical value, the original linked list will be transformed into a red-black tree. Another prerequisite for this transformation is that the table has a length greater than or equal to 64, which is not shown in the figure for convenience.

Methods the process

If the table is empty at first, call the expansion method to initialize the table
If the bucket to be inserted is empty, insert it directly
If the bucket already has elements and the first element has the same hash value as the element to be inserted, go to subsequent processing; otherwise, go to 4
Hash conflicts are handled according to red-black tree logic if tree nodes are used. If it is an ordinary node, the list is traversed to see if there is a key that is the same as the element to be inserted. If the key is found, skip the loop and go to 5, otherwise go to 6
Check whether the oldValue needs to be updated according to onlyIfAbsent. The method ends after return oldValue
Insert new nodes at the end of the linked list, and judge whether the number of nodes reaches the critical value, then tree. Go to the seven
Add 1 to the total number of nodes and determine whether to expand

Let’s look at the code

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // If the table has not been initialized, call resize to initialize it
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // If the current hash position is empty, just put the key-value pair in it
        //(n-1) &hash is equivalent to hash % n, for reasons explained later
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        // If the location already has elements, use chain hash
        else {
            Node<K,V> e; K k;
            // If the key of the first node p is the same as the key to be inserted
            // Save the value of p and update the value directly
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                e = p;
            // If p is a tree node, insert according to tree logic
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                // Otherwise, find the insertion position by traversing the list corresponding to the current position
                for (int binCount = 0; ; ++binCount) {
                    // If no node with the same key value is found at the end of the list, the new node is placed at the end of the list
                    if ((e = p.next) == null) {
                        // Save to the end
                        p.next = newNode(hash, key, value, null);
                        // If the length of the list exceeds the threshold, tree the list
                        //bincount shortens the first node, so we need to subtract 1
                        // Note that bincount does not count newNode
                        // So there are actually 9 nodes in the tree
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // Find the same element in the list as the key to be inserted
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        break; p = e; }}// if e is not null, the linked list has the same key value as the key to be inserted
            if(e ! =null) { // existing mapping for key
                V oldValue = e.value;
                //onlyIfAbsent The default false indicates that oldValue is updated with a new value when the key to be inserted already exists
                // Otherwise keep the original key-value unchanged
                if(! onlyIfAbsent || oldValue ==null)
                    e.value = value;
                // In hashMap, this method is only used for empty LinkedHashMap
                afterNodeAccess(e);
                // Return the old value
                returnoldValue; }}// If the map does not have the same key as the element to be inserted, increment the number of changes by 1
        ++modCount;
        // The number of nodes is increased by 1 if the threshold is exceeded
        if (++size > threshold)
            resize();
        // This method is also used by LinkedHashMap
        afterNodeInsertion(evict);
        return null;
    }
Copy the code

Now explain the problems mentioned in the above notes

N minus 1 and hash

First, hash is an int range, which in Java is -2147483648 to 2147483647. If all hashes were indexed to the table array, the table. Length would be over 4 billion! This is clearly not scientific. Therefore, we can use hash % n as the index value, which makes the range much smaller.

On a computer, ampersand is faster than %. Also, (n-1) &hash is hash % n only if the length of the table is 2 to the k power. Why is that? First of all, when n = 2^k, n-1 corresponds to k ones in base 2, so (n-1) &hash takes the value of the last K bit of the hash, because in the binary of the hash, the value before the last k bit must be a multiple of n, so it must be 0. So the last k bits are the result of hash % n. This also explains why the length of the table has to be 2 to the k.

The following is a schematic diagram, taking n=8 and hash=75 as an example

Resize () Expansion method

Used to initialize a table or double the capacity of a table

Methods the process

If the old capacity is greater than 0, check whether the capacity exceeds the upper limit after doubling. If not, double the capacity. If the old threshold overflows after doubling, go to 4
If the old capacity is equal to 0 and the old threshold is greater than 0, the new capacity is equal to the old threshold. Go to the 4
If both the old capacity and the old threshold are 0, the default values are used for initialization
Calculate the new threshold using newCap * loadFactor
Create a new array based on the new capacity
Map the elements of the old array to the new array, and split the linked list in the original bucket into two parts. The lower part is still in the position J of the original bucket, and the higher part is put in the position oldCap+ J, oldCap is the old capacity

    final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
        / / the old capacity
        int oldCap = (oldTab == null)?0 : oldTab.length;
        / / old threshold
        int oldThr = threshold;
        int newCap, newThr = 0;
        // If the old capacity is greater than 0, the system has been initialized
        if (oldCap > 0) {
            // If the old capacity exceeds the maximum capacity, set the threshold to the maximum int and the capacity will not be expanded
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            // If two times of the old capacity is still within the maximum capacity and the old capacity is greater than or equal to 16, double the capacity and threshold
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1;
        }
        else if (oldThr > 0) 
            // If the constructor with arguments is called the first putVal will enter here oldCap=0
            // Make the new capacity equal to the old threshold value and the new threshold value will be calculated later
            newCap = oldThr;
        else {               
            // If the HashMap() constructor is called, the branch is entered. OldCap and oldThr are both 0
            // Initialize with default values
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        //newThr = 0 can happen in one of two ways
        NewThr = oldThr << 1; NewThr overflow becomes 0
        // The other code enters the oldThr > 0 branch without counting newThr
        if (newThr == 0) {
            // Calculate newThr by the formula
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        // Update the threshold
        threshold = newThr;
        // Create an array based on the expanded capacity
        @SuppressWarnings({"rawtypes","unchecked"})
            Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
        if(oldTab ! =null) {
            // Iterate over the old array
            for (int j = 0; j < oldCap; ++j) {
                Node<K,V> e;
                if((e = oldTab[j]) ! =null) {
                    oldTab[j] = null;
                    // If there is only one element in the bucket, map it directly to the location of the new array
                    if (e.next == null)
                        newTab[e.hash & (newCap - 1)] = e;
                    // If it is a tree node, split the tree and then map it
                    else if (e instanceof TreeNode)
                        ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                    // If there is more than one element in the bucket and it is not a tree, split the corresponding list elements in the bucket into two lists according to the conditions
                    else { // preserve order
                        Node<K,V> loHead = null, loTail = null;
                        Node<K,V> hiHead = null, hiTail = null;
                        Node<K,V> next;
                        do {
                            next = e.next;
                            // Suppose oldCap = 2^k
                            // Then the binary hash value from the lowest to the highest bit of the k+1 is 0 into the branch
                            // Branch is linked list join operation
                            if ((e.hash & oldCap) == 0) {
                                if (loTail == null)
                                    loHead = e;
                                else
                                    loTail.next = e;
                                loTail = e;
                            }
                            // The binary hash value from the lowest to the highest k+1 bit of 1 enters the branch
                            else {
                                if (hiTail == null)
                                    hiHead = e;
                                elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                        // The position of the low list in the new array is exactly the same as in the original array
                        if(loTail ! =null) {
                            loTail.next = null;
                            newTab[j] = loHead;
                        }
                        // The position of the high list in the new array is larger than the old array oldCap
                        if(hiTail ! =null) {
                            hiTail.next = null;
                            newTab[j + oldCap] = hiHead;
                        }
                    }
                }
            }
        }
        return newTab;
    }
Copy the code

Get (Object key) method

Get value based on key

Methods the process

Locate the table based on the hash value of the key. If the table is not empty, go to 2; otherwise, null is returned
Determine if the first element in the bucket is desired, if so, return it, otherwise go to 3
If there is more than one node in the bucket, the tree node uses the logical lookup of a binary balanced tree, with ordinary nodes traversing the linked list

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // Table is not empty and the hash position is not empty to enter the branch
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) ! =null) {
            // Check whether the first node in the bucket is desired
            if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
                return first;
            // The first node is not the one to be found and there is more than one node in the bucket
            if((e = first.next) ! =null) {
                // If the tree node calls the tree lookup method, the binary balanced tree lookup method is used
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    // Otherwise iterate over the list
                    if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                        return e;
                } while((e = e.next) ! =null); }}return null;
    }
Copy the code

getTreeNode(int h, Object k)

GetTreeNode is a method of the TreeNode class. Here is the simplified TreeNode class

static final class TreeNode<K.V> extends LinkedHashMap.Entry<K.V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        final TreeNode<K,V> getTreeNode(int h, Object k) {
            // If the parent node of getTreeNode is not empty, start at the root of the tree
            return((parent ! =null)? root() :this).find(h, k, null);
        }
    	// Balance binary tree lookup
        final TreeNode<K,V> find(inth, Object k, Class<? > kc) {
            TreeNode<K,V> p = this;
            do {
                int ph, dir; K pk;
                TreeNode<K,V> pl = p.left, pr = p.right, q;
                // The root hash is larger than h
                if ((ph = p.hash) > h)
                    p = pl;
                // Otherwise, go to the right subtree
                else if (ph < h)
                    p = pr;
                // The key is found
                else if((pk = p.key) == k || (k ! =null && k.equals(pk)))
                    return p;
                // Find the right subtree if the left subtree is empty
                else if (pl == null)
                    p = pr;
                // If the right subtree is empty, check the left subtree
                else if (pr == null)
                    p = pl;
                // Compare according to the comparison logic passed in by the user
                else if((kc ! =null|| (kc = comparableClassFor(k)) ! =null) && (dir = compareComparables(kc, k, pk)) ! =0)
                    p = (dir < 0)? pl : pr;// Go to the right subtree
                else if((q = pr.find(h, k, kc)) ! =null)
                    return q;
                // go to the left subtree
                else
                    p = pl;
            } while(p ! =null);
            return null; }}Copy the code

Remove (Object key) method

Delete elements by key

Methods the process

Search for the bucket based on the hash value of the key
Returns NULL if the bucket is empty
If the bucket is not empty and the first element is to be deleted, use node to preserve the node. Otherwise, go to 4
Look at subsequent elements, and if it’s a tree node, the logic of the tree is called, while regular nodes traverse the list
If the node to be deleted is found, it is deleted, otherwise null is returned

    public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null.false.true)) = =null ?
            null : e.value;
    }
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if((tab = table) ! =null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) ! =null) {
            Node<K,V> node = null, e; K k; V v;
            // The first element in the bucket is the element to be deleted
            if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
                node = p;
            // If the first element is not to be deleted, look at subsequent elements
            else if((e = p.next) ! =null) {
                // If it is a tree, delete it with red-black tree logic
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else {
                    // Otherwise iterate over the list
                    do {
                        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while((e = e.next) ! =null); }}// The point to delete is found
            // If matchValue is false, delete immediately
            // If matchValue is true then check if the values match
            if(node ! =null&& (! matchValue || (v = node.value) == value || (value ! =null && value.equals(v)))) {
                // Delete the tree node
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                // Make the bucket head point to the second element if you want to delete the first element of the bucket
                else if (node == p)
                    tab[index] = node.next;
                // Delete nodes from regular lists
                else
                    p.next = node.next;
                ++modCount;
                // The number of elements is reduced by 1
                --size;
                // The function body is empty LinkedHashMap, and visting emoval is used
                afterNodeRemoval(node);
                returnnode; }}return null;
    }
Copy the code

conclusion

This paper analyzes the most commonly used methods of Put, get and remove of HashMap, and analyzes the code of its association method.

Among them, the insertion and deletion logic of red-black tree is not specifically analyzed in this paper. I plan to write a special article on red black trees later, and then add this part to it.

Refer to the article

Juejin. Cn/post / 684490…

Segmentfault.com/a/119000001…

Probably the best source analysis of HashMap for beginners

preface

The source code parsing

variable

A constructor

TableSizeFor (int cap) method

Put (K key, V value) method

Hash (Object key) method

PutVal method

Methods the process

Resize () Expansion method

Methods the process

Get (Object key) method

Methods the process

getTreeNode(int h, Object k)

Remove (Object key) method

Methods the process

conclusion

Refer to the article

Related Posts

LeetCode brush problem – the NTH Tibonacci number

Python + Appium automatic test -openCV determine the similarity of images

Cocoapods binary