All of my original Android knowledge has been packaged up on GitHub. Strive to create a series of junior high school senior engineers can understand the quality of the article, welcome star~

1. Storage structure

1.1 JDK 1.7

Internally, the Entry object is stored in the form of an array, and each Entry object contains key and value to store values. It contains four fields: key, Value, Next, and Hash. The next field refers to the next Entry (the same hash value will be put into the same linked list). Each position in the array is a singly linked list (also known as a bucket), the head of the elements in the array. Conflicts are resolved using the zipper method, where the hash values of entries in the same single-linked list are the same.

transient Entry<K,V>[] table;

static class Entry<K.V> implements Map.Entry<K.V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;
}
Copy the code

1.2 JDK 1.8

The same hash value in 1.7 is placed in a single linked list, as is 1.8. When the length of the single linked list exceeds 8, it will be converted into a red-black tree to increase the search efficiency.

2. Basic concepts

2.1 Load factor and Threshold

/** * The default initial capacity - MUST be a power of two. */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/** * The load factor used when none specified in constructor. */
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
Copy the code

The default size of the HashMap is 16, and the default load factor is 0.75, meaning that the array must be expanded when the number of elements stored exceeds 16*0.75, where the threshold is 16*0.75=12. Load factors are used to control when to expand.

Threshold = current array length * load factor

The threshold has to be recalculated after each expansion. The default array length is 16, and the default load factor is 0.75. Expanding the array when the number of elements reaches 75% is a compromise critical point. If the value is set high, hash conflicts will be serious, and buckets will be deep and slow to search. If the value is set low, space will be wasted. In general, this load factor is not customized.

,,,, Zhihu has a discussion on this issue – is “threshold” a misspelling of “threshold”?

2.2 Working principle of zipper method

Each time a new key-value pair is stored, the hashCode of Entry is calculated first, and the subscript of the bucket is obtained by using the hashCode% array length. Then, the element is searched in the bucket successively to see if it already exists. If it does, the element is updated, and if it does not, it is inserted into the head of the bucket.

3. Source code analysis

3.1 JDK 1.7

With the above basic concepts, start to look at the source code to learn

public class HashMap<K.V>
    extends AbstractMap<K.V>
    implements Map<K.V>, Cloneable.Serializable
{
    // The default initial capacity, which must be a power of 2, is 16
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
    // Maximum capacity
    static final int MAXIMUM_CAPACITY = 1 << 30;
    // Default load factor
    static final float DEFAULT_LOAD_FACTOR = 0.75 f;
    // An empty array by default
    static finalEntry<? ,? >[] EMPTY_TABLE = {};// An array to hold real data
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
    // The actual number of key-value pairs in the current HashMap
    transient int size;
    // Threshold = array length * load factor
    int threshold;
    // Load factor
    final float loadFactor;
    // Identifies the number of structural changes to the HashMap. Structural changes are additions, subtractions, or other changes to its internal structure (such as rehash).
    // For iterators to fail quickly.
    transient int modCount;
    
    public HashMap(a) {
        this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
    }
    
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }
    
    // You can specify the array size and load factor at the same time
    public HashMap(int initialCapacity, float loadFactor) {...// omit some logical judgment
        if(initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; .this.loadFactor = loadFactor; threshold = initialCapacity; . }static class Entry<K.V> implements Map.Entry<K.V> {
        final K key;
        V value;
        Entry<K,V> next;
        inthash; }}Copy the code

3.1.1 the put

Here are some of the basic attributes of a HashMap, all of which are relatively important. Next, let’s look at the implementation of the add element PUT method. Here is the PUT code for JDK 1.7

public V put(K key, V value) {
    //1. Array is empty -> initialize (create) array
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    //2. If the key is null, it is processed separately
    if (key == null)
        return putForNullKey(value);
    //3. Compute the hash value
    int hash = hash(key);
    //4. Calculate at which index of the array the hash value should be stored
    int i = indexFor(hash, table.length);
    //5. Iterate through the list (each element of the array is the head of a single list) to see if the same key already exists in the list. If so, replace it
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    //6. Add elements to the array
    addEntry(hash, key, value, i);
    return null;
}
Copy the code
3.1.1.1 inflateTable Array Initialization

There are so many things missing in a few lines of code, so let’s take a look at them one by one. First, we initialize the array inflateTable method, passing in thresholds.

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
    initHashSeedAsNeeded(capacity);
}

private static int roundUpToPowerOf2(int number) {
    // assert number >= 0 : "number must be non-negative";
    return number >= MAXIMUM_CAPACITY
            ? MAXIMUM_CAPACITY
            : (number > 1)? Integer.highestOneBit((number -1) < <1) : 1;
}

//Integer.highestOneBit
public static int highestOneBit(int var0) {
    / / mask
    var0 |= var0 >> 1;
    var0 |= var0 >> 2;
    var0 |= var0 >> 4;
    var0 |= var0 >> 8;
    var0 |= var0 >> 16; 
    
    //>>> : Unsigned move right. Whether positive or negative, the highest digit is filled with zeros. The only remaining digit is 1
    return var0 - (var0 >>> 1);
}

Copy the code

The roundUpToPowerOf2 method is to find a power of 2 that is a little bit larger than number, and the code here looks a little confusing. It will finally figure out how long the array should be initialized, and it will automatically convert the incoming capacity to 2 to the n.

Integer.highestOneBit takes the leftmost binary bit of the number passed in and zeros all the way after it, and returns an int. For example, if the number is 7(0111), then the final result is 4(0100). For example, if the number is 9, then the number-1 is 8(1000), and the left is 10000 (16). And then finally it converts the incoming capacity to two to the n.

Once the capacity is calculated, the threshold is calculated and the array is initialized.

3.1.1.2 putForNullKey Add a NULL key

A special method is used to handle cases where the key is null

/** * Offloaded version of put for null keys */
private V putForNullKey(V value) {
    for (Entry<K,V> e = table[0]; e ! =null; e = e.next) {
        if (e.key == null) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }
    modCount++;
    addEntry(0.null, value, 0);
    return null;
}
Copy the code

Put the element in the first position of the array. The first position is also a bucket, where only one element can have a null key and the rest are assigned by the hash algorithm.

3.1.1.3 Compute the Hash value
/** * Retrieve object hash code and applies a supplemental hash function to the * result hash, which defends against poor quality hash functions. This is * critical because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */
final int hash(Object k) {
    int h = hashSeed;
    if (0! = h && kinstanceof String) {
        return sun.misc.Hashing.stringHash32((String) k);
    }

    h ^= k.hashCode();

    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Once you’ve got the hashCode for the key, and you’ve done some dirty work on it, the hash algorithm here is pretty well designed, and if it’s well designed, it’s going to reduce hash collisions a lot.

3.1.1.4 indexFor Calculates the index of an element in an array
 /** * Returns index for hash code h. */
static int indexFor(int h, int length) {
    // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2";
    return h & (length-1);
}
Copy the code

Hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash hash Why h & (length-1) = h % length? It actually has to do with length, and length, as I said, has to be a power of two. Let’s take a simple example,h=2,length=8.

h & (length-1)
= 00000010 & 00000111
= 00000010
Copy the code

The final result above is 2, 2%8 is indeed equal to 2, complete.

3.1.1.5 addEntry Adds an element to an array

Either the bucket was empty before the element was added, or the bucket already had elements (hash conflict).

/** * Adds a new entry with the specified key, value and hash code to * the specified bucket. It is the responsibility of this * method to resize the table if appropriate. * * Subclass overrides this to alter the behavior of put method. */
void addEntry(int hash, K key, V value, int bucketIndex) {
    //1. The number of key/value pairs exceeds the threshold && The array at the index is not empty (indicating that elements already exist here)
    if ((size >= threshold) && (null! = table[bucketIndex])) {// Expand -> double the original size
        resize(2 * table.length);
        hash = (null! = key) ? hash(key) :0;
        bucketIndex = indexFor(hash, table.length);
    }

    2. Create an Entry node
    createEntry(hash, key, value, bucketIndex);
}

// Create new nodes
void createEntry(int hash, K key, V value, int bucketIndex) {
    // Table [bucketIndex] is placed after the newly inserted node
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}

Copy the code

If the key value pair exceeds the threshold, the capacity is expanded

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    if (oldCapacity == MAXIMUM_CAPACITY) {
        threshold = Integer.MAX_VALUE;
        return;
    }

    // Create an array based on the new capacity
    Entry[] newTable = new Entry[newCapacity];
    // Transfer the data to the new array
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    // Update the threshold
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}

// Transfer the data to the new array
void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        // The element is not empty
        while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            // Calculate the index of the new array based on the hash value of the node
            int i = indexFor(e.hash, newCapacity);
            // Move the bucket elements one by one to the new index of the new array
            // Note that the order inside the bucket is reversed.
            1->2->3 ->3 ->2->1e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

After capacity expansion, data migration is involved, and the location of nodes in the new array needs to be recalculated, and the thresholds need to be updated.

That’s all there is to put in JDK 1.7, and a lot more. The core of this is the put part of the code, but get is a little bit easier and I’m not going to analyze it here.

3.2 JDK 1.8

Basically the idea is the same, also use an array + linked list (or red-black tree) to hold data.

transient Node<K,V>[] table;

// If the list length is greater than 8 and the array length is greater than 64, convert the list to a red-black tree
static final int TREEIFY_THRESHOLD = 8;

// In 1.8 the node name was changed.. Changed to the Node
static class Node<K.V> implements Map.Entry<K.V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}

Copy the code

How does it implement the put method

3.2.1 the put

The 1.8 PUT method looks a little more complicated than the 1.7 method, but don’t worry, we’ll go through it line by line

public V put(K key, V value) {
    return putVal(hash(key), key, value, false.true);
}

/**
 * Implements Map.put and related methods.
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;

    //1. If table is empty, create an array for initialization
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    //2. Find the index position of the element in the array based on the hash and the length of the array. If this is empty, put the node here
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        //3. If a node already exists at the index and the hash value and key value are equal (value needs to be replaced), record the node reference at the index
        if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
            e = p;
        //4. If the index is a red-black tree, insert the node into the tree
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        //5. The index is a linked list
        else {
            //5.1 Iterate through the linked list
            for (int binCount = 0; ; ++binCount) {
                //5.2 Insert the node to the end of the list
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // If the list length exceeds 8, the tree is converted to a red-black tree
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // end the for loop if the key is found to be equal
                if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                    break; p = e; }}//6. Replace the original value
        if(e ! =null) { // existing mapping for key
            V oldValue = e.value;
            if(! onlyIfAbsent || oldValue ==null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //7. If the threshold is exceeded, expand the capacity
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}
Copy the code

The comments are more detailed, and the difference from 1.7 is quite significant.

  • Inserting a node into a linked list is header interpolation in Java7, and tail interpolation in Java8
  • In Java8, lists that are larger than 8 and arrays larger than 64 are treified
  • Java7 handles the null key separately,Java8 does not handle the null key separately (although they both hash at 0 and place the hash at 0).

3.2.1.1 resize capacity

Let’s focus on the core code first.

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    // The old array length
    int oldCap = (oldTab == null)?0 : oldTab.length;
    int oldThr = threshold;
    // New array length new threshold
    int newCap, newThr = 0;
    if (oldCap > 0) {
        // If the old array is larger than MAXIMUM_CAPACITY, set the threshold to integer. MAX_VALUE.
        // In normal cases, you will not go to this logical branch
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        //1. Expand the array size by 2
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // The threshold is also *2
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        //2. Initialize arrays
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
    Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if(oldTab ! =null) {
        //3. Iterate over the old array
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if((e = oldTab[j]) ! =null) {
                oldTab[j] = null;
                //3.1 There is only one element in the bucket at the index. Calculate the node's position in the new array according to the hash of the node and the length of the new array, and then place the node in the new array
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                //3.2 The index is treated separately by red-black tree
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                //3.3 The index is a single linked list (the list length is less than 8)
                else { // preserve order
                    // The hash value is 0,loHead is the head and loTail is the tail
                    Node<K,V> loHead = null, loTail = null;
                    // The list that needs to be moved,hash & the old array length is 1,hiHead is the head,hiTail is the tail
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // Hash value & the length of the old array
                        // The highest bit is 0 or 1. If it's 1, we need to move to j + oldCap
                        // Each list is split into two more
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
                    // These elements are still in the old index
                    if(loTail ! =null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // These elements are moved to the old index position +oldCap
                    if(hiTail ! =null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}
Copy the code

Resize is a bit messy, both in terms of initialization arrays and in terms of expansion logic. It’s easy to initialize an array, so let’s look at some of the expansion logic. The first step is to traverse the array, where there are three possible conditions at each index of the array

  1. There is only one element in the bucket at the index -> Find the position of the node in the new array based on the hash of the node and the length of the new array, and place it in the new array
  2. The index is red black tree -> separate processing
  3. The length of the linked list at this index is greater than 1 and less than 8

The third case is more complicated, so let’s analyze it separately.

Let’s just do one thing before we analyze it, let’s say that the array length is n=16, and then according to the put part of the code, when we store it into the array, the index is (16-1) &hash, so we have two elements key1 and key2, and they have a hash value of 5 and 21, respectively. Here is how they are calculated

key1:
00001111 & 00000101 = 5

key2:
00001111 & 00010101 = 5
Copy the code

When the array is expanded,n=32

key1:
00011111 & 00000101 = 00101 = 5

key2:
00011111 & 00010101 = 10101 = 5 + 16 = 21
Copy the code

After the expansion, n-1 has one more 1 than before, which will cause the position of key2 to change to the original position of +16 when bitwise and. Since we are using an extension to the power of 2, the element is either in the original position or in the original position +2.

With the above analysis in mind, let’s go back to our logic at 3.3

if ((e.hash & oldCap) == 0) {
    if (loTail == null)
        loHead = e;
    else
        loTail.next = e;
    loTail = e;
} else {
    if (hiTail == null)
        hiHead = e;
    else
        hiTail.next = e;
    hiTail = e;
}
Copy the code

E. hash & oldCap: Hash the element with the length of the old array, assuming the previous array length was 16, then it is bit with 10000, and usually put index is bit with (n-1) = 1111. After the expansion, put will have to be bitwise to 11111. So it just wants to see if the hash bit is 1 or 0. If 0, keep the old position; if 1, add the length of the old array to the new position.

Why would you do that? The main reason is that it is easy to calculate and does not require a hash recalculation as in JDK 1.7. The other thing is to make the elements more diffuse. It used to be on one chain, but now it’s on two chains (at different array indexes), so lookups are faster.

One small point to note is that this is tail-insertions and in the same order, whereas JDK 1.7 is head-insertions and in the opposite order.

That’s about it, a little bit more.

4. Related knowledge

4.1 HashMap 1.7 is different from 1.8

  1. JDK1.7 uses head insertions,JDK1.8 and permutations use tail insertions. 1.7 is inserted in the reverse order, while 1.8 is inserted in the same order
  2. JDK1.7 is an array + list + red-black tree
  3. In JDK1.7, capacity is expanded before data is inserted. In JDK1.8, capacity is expanded after data is inserted
  4. JDK1.7 is an Entry representing a Node, while JDK1.8 is a Node
  5. JDK1.7 Expansion and post-storage location is usedhash & (length-1)While JDK1.8 can quickly calculate whether to place the hash bit in the original position after expansion, or whether to place the hash bit in the original position + the size value of expansion, just by determining whether the hash bit is 0 or 1.
  6. JDK1.7 uses nine perturbations to compute the hash value compared to JDK1.8’s two

Ps: red black tree to find elements, O (logn) overhead

4.2 Differences between Hashtable and HashMap

  1. Hashtable does not support null keys and values
  2. Hashtable uses synchronized for synchronization (performance overhead)
  3. The iterator for Hashtable is the fail-fast iterator
  4. A Hashtable defaults to 11 and does not require the underlying array to be a power of 2, whereas a HashMap is 16 and must be a power of 2.

HashMap is the preferred choice for most key-value access scenarios. In multi-threaded environments, ConcurrentHashMap is recommended.

reference

  1. HashMap(1)
  2. HashMap(2)
  3. Java container
  4. Openjdk collection source code