Read the source code, HashMap review

This article first summarizes the previous two collections to fill in some omissions, and then gives a brief introduction to HashMap.

review

Since the first two arrayLists and LinkedLists were all about individual collection classes, today’s analysis of HashMap can be combined to take a look at the Collection framework in Java. The figure below is only a small part, and the abstract classes have been removed for ease of understanding.

Collections in Java (sometimes called containers) are meant to store objects, and most of the time, more than one object.

Java collections can be easily divided into two categories:

One is Collection, which stores individual elements, that is, a single object. Subdivision, the common ones are List, Set, Queue. List guarantees that elements are stored in the order they are inserted. Set cannot have repeating elements. Queue Accesses elements according to the rules of the Queue, usually on a “first in, first out” basis.
One is a Map, which stores “key-value pairs” that look up values by keys. For example, in real life, phone numbers are searched by name, and personal details are searched by ID number.

Theoretically, we can use only the Collection system, such as encapsulating key-value pairs into objects and storing them in the implementation class of the Collection. The main reason for proposing Map is efficiency.

Introduction of a HashMap

A HashMap is used to store key-value pairs, two elements at a time. In JDK1.8, its implementation is based on == array + linked list + red black tree ==, simply speaking, the ordinary situation directly use the array, hash conflict occurs in the conflict position changed to linked list, when the list exceeds a certain length, changed to red black tree.

== stores a linked list or red-black tree == in an array.

With no hash collisions at all, each element of the array is a linked list of capacity 1. Such as elements on indexes 0 and 1.
When a minor hash conflict occurs, each element of the array is a linked list of multiple elements. Such as the element on index 2.
When the number of conflicts exceeds 8, each element of the array is a red-black tree. Such as the element on index 6.

The following figure is a schematic diagram, and the related structure does not strictly follow the specification.

Class signature

public class HashMap<K.V> extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable
Copy the code

The following figure

Implement Cloneable and Serializable interfaces, with clone and serialization capabilities.

HashMap inherits AbstractMap and implements the Map interface for the same reason as LinkedList.

constant

// Serialize the version number
private static final long serialVersionUID = 362498820763181265L;

// The default initial capacity is 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 

// Maximum capacity, 2 ^ 30
static final int MAXIMUM_CAPACITY = 1 << 30;

// The default load factor is 0.75
static final float DEFAULT_LOAD_FACTOR = 0.75 f;


// The following three constants should be combined
// The threshold for the list to become a tree
static final int TREEIFY_THRESHOLD = 8;

// The threshold for converting a tree to a linked list is less than 6
static final int UNTREEIFY_THRESHOLD = 6;

// The minimum size of the set when the list is turned to the tree. Only if the total capacity is greater than 64 and the number of conflicting lists is greater than 8 is converted to a tree.
static final int MIN_TREEIFY_CAPACITY = 64;
Copy the code

The key of the above variables lies in the timing of the list to turn tree and the tree to turn list.

If the size of the array is smaller than 64, no matter how many conflicts exist in the array, the system does not tree and expands the array.
When the array size is greater than or equal to 64,
- If the number of conflicts is greater than 8, tree is performed.
- When the number of elements in a red-black tree is less than 6, the tree is converted to a linked list.

variable

// An array of storage nodes, always a power of 2
transient Node<K,V>[] table;

// See the corresponding constructor for details
transient Set<Map.Entry<K,V>> entrySet;

// The actual number of key-value pairs stored
transient int size;

// The number of times to change the map for quick failure
transient int modCount;

// Critical value for capacity expansion, essentially capacity * load factor
int threshold;

// Load factor
final float loadFactor;
Copy the code

The type of node stored in the array, as you can see, in addition to K and Value, also contains a reference to the next node. As we said at the beginning, the node is really a one-way linked list.

static class Node<K.V> implements Map.Entry<K.V> {
    
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
    
   / /... Omit common methods
}
Copy the code

A constructor

Common no-parameter constructs and one-parameter constructs are simple and pass values directly, omitted here. Take a look at the construction of the two parameters.

public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal initial capacity: "+initialCapacity);
    
    // The specified capacity cannot exceed the maximum value
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <= 0 || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                           loadFactor);
    
    this.loadFactor = loadFactor;
    // Convert the given capacity to a power not less than 2 of itself
    this.threshold = tableSizeFor(initialCapacity);
}
Copy the code

TableSizeFor method

One of the above methods is very clever, tableSizeFor, which converts a given numeric value to the smallest integer power of 2 that is not less than itself.

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0)?1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
Copy the code

For example, if cap=10, convert to 16; Cap =32, it’s still 32. Bit operation is used to ensure efficiency.

One question, why do we have to convert capacity to a power of two? Why don’t you need an ArrayList? The key is that hashing, or more accurately converting to powers of two, reduces the hash conflict to a certain extent.

It’s easy to draw a sketch of these operations, but the point is it’s cool to think of them. Explanation words with too many pictures, here space is limited, put the content in another article, HashMap of tableSizeFor method diagram.

Add elements

In the constructor above, we do not see the initialization of the array (Node

[] table). This step is reserved for adding the element PUT.
,v>

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}
Copy the code

You can see that put calls putVal.

PutVal method

Before that, let’s review the composition of HashMap: array + linked list + red-black tree. If the array is empty, store it in the array, not empty, store it in the list, overload the list, convert it to a red-black tree.

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) {
    
    Node<K,V>[] tab; 
    Node<K,V> p; 
    int n, i;
    // If the array is empty, expand the capacity
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // The hash value of position I in the array is calculated based on the key.
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    // The corresponding position of the array is not empty
    else {
        Node<K,V> e; 
        K k;
        // The key exists on the corresponding node key, overwriting value directly
        if(p.hash == hash &&((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        // is a red-black tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // is a linked list
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // Lists are converted to red-black trees
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}if(e ! =null) { 
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    
    ++modCount;
    // Do you need to expand the capacity before the next addition? If the capacity is full, expand the capacity in advance
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

Resize () is a complex method, it is best to use IDE tools, debug, it is easier to figure out how to expand and when, if dry words will be confused.

Access to elements

The getNode method is called internally to get the corresponding value based on the key

public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy the code

GetNode method

final Node<K,V> getNode(int hash, Object key) {
    
    Node<K,V>[] tab; 
    Node<K,V> first, 
    e; int n; 
    K k;
    
    // The array is not empty
    if((tab = table) ! =null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) ! =null) {
        // The value of the first object is returned
        if(first.hash == hash && ((k = first.key) == key || (key ! =null && key.equals(k))))
            return first;
        if((e = first.next) ! =null) {
            // search in the red-black tree
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                // find in the list
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    return e;
            } while((e = e.next) ! =null); }}return null;
}
Copy the code

conclusion

There are too many contents in HashMap and many knowledge points related to each content. Due to the limitation of space and personal ability, it is difficult to explain all contents clearly. For example, the most basic method of obtaining hash value is also very particular. When you have the opportunity to write down specific details slowly.