Thinking Logic for computer programs (40) - Dissecting HashMap

This series of articles have been supplemented and improved, and has been revised and organized into a book, The Logic of Java Programming (written by Ma Junchang), published by Huazhang Branch of China Machine Press, which was on the market in January 2018 and received high praise from readers. Major online shops and bookstores are available for sale, welcome to buy: JINGdong self-run link

The previous two sections introduced ArrayList and LinkedList. One common feature of them is that they are inefficient in searching for elements and need to be compared one by one. This section introduces HashMap, which is much more efficient in searching. How does it work? How is it done? This section provides details.

Literally, a HashMap consists of two words, Hash and Map. Map is not a Map, but a mapping relationship. It is an interface.

Next, we’ll look at the Map interface, then how to use HashMap, then how to implement it, and finally we’ll summarize the features of HashMap.

The Map interface

The basic concept

A Map has the concepts of keys and values. A Map stores and accesses values based on keys. Keys cannot be duplicated, that is, only one copy of a key is stored. Using maps makes it easy to handle scenarios where objects need to be accessed by key, such as:

A dictionary application, the key can be a word, the value can be word information class, including meaning, pronunciation, example sentences and so on.
Count and record the number of occurrences of all words in a book. You can use words as keys and occurrences as values.
Manage configuration items in a configuration file. Configuration items are typical key-value pairs.
Query personnel information according to id number, ID number is the key, personnel information is the value.

Arrays, ArrayLists, and LinkedLists can be considered special maps with keys as indexes and values as objects.

The interface definition

The Map interface is defined as:

public interface Map<K.V> {
    V put(K key, V value);
    V get(Object key);
    V remove(Object key);
    int size(a);
    boolean isEmpty(a);
    boolean containsKey(Object key);
    boolean containsValue(Object value);
    void putAll(Map<? extends K, ? extends V> m);
    void clear(a);
    Set<K> keySet(a);
    Collection<V> values(a);
    Set<Map.Entry<K, V>> entrySet();
    interface Entry<K.V> {
        K getKey(a);
        V getValue(a);
        V setValue(V value);
        boolean equals(Object o);
        int hashCode(a);
    }
    boolean equals(Object o);
    int hashCode(a);
}
Copy the code

The Map interface has two type parameters, K and V, which represent the types of Key and Value respectively. Let’s explain the methods.

Save key-value pairs

V put(K key, V value);
Copy the code

Key saves the value value. If a key already exists in the Map, the Map overwrites the corresponding value and returns the original value. If no key exists in the Map, the Map returns null. Keys are identical in that they are either null or the equals method returns true.

Get values based on keys

V get(Object key);
Copy the code

If not found, null is returned.

Delete key-value pairs by key

V remove(Object key);
Copy the code

Returns the original value of the key, or null if no key exists in the Map.

View the size of the Map

int size(a);
boolean isEmpty(a);
Copy the code

Check to see if a key is included

boolean containsKey(Object key);
Copy the code

Check to see if a value is included

boolean containsValue(Object value);
Copy the code

Batch saving

void putAll(Map<? extends K, ? extends V> m);
Copy the code

Saves all key-value pairs in parameter m to the current Map.

Clear all key-value pairs in the Map

void clear(a);
Copy the code

Gets the collection of keys in the Map

Set<K> keySet(a);
Copy the code

A Set is an interface that represents the mathematical concept of a Set, that is, a Set of elements that are not repeated. It is defined as:

public interface Set<E> extends Collection<E> {}Copy the code

It extends Collection without defining any new methods, but it requires all implementers to ensure that the Set has semantic constraints, i.e., no repeating elements. We’ll talk more about Set in the next section.

There are no duplicate keys in the Map, so ketSet() returns a Set.

Gets the collection of all values in the Map

Collection<V> values(a);
Copy the code

Gets all key-value pairs in the Map

Set<Map.Entry<K, V>> entrySet();
Copy the code

Map.Entry

is a nested interface defined inside the Map interface and represents a key-value pair. The main methods are as follows:
,v>

K getey(a);
V getValue(a);
Copy the code

KeySet ()/values()/entrySet() have a common feature. They all return views, not copied values. Changes based on the returned values directly modify the Map itself, for example:

map.keySet().clear();
Copy the code

All key-value pairs are deleted.

HashMap

Using the example

HashMap implements the Map interface. Let’s take a look at how to use it with a simple example.

In the random section, we introduced how to generate random numbers. Now, let’s write a program to see if random numbers are uniform. For example, we can randomly generate 1000 numbers from 0 to 3 and count the number of times each number is generated. The code could be written like this:

Random rnd = new Random();
Map<Integer, Integer> countMap = new HashMap<>();

for(int i=0; i<1000; i++){
    int num = rnd.nextInt(4);
    Integer count = countMap.get(num);
    if(count==null){
        countMap.put(num, 1);
    }else{
        countMap.put(num, count+1); }}for(Map.Entry<Integer, Integer> kv : countMap.entrySet()){
    System.out.println(kv.getKey()+","+kv.getValue());
}
Copy the code

The output of a run is:

0269 1236 2261 3234Copy the code

The code is pretty simple, so I won’t explain it.

A constructor

In addition to the default constructor, HashMap has the following constructor:

public HashMap(int initialCapacity)
public HashMap(int initialCapacity, float loadFactor)
public HashMap(Map<? extends K, ? extends V> m)
Copy the code

The last one is constructed with an existing Map and copies all the key-value pairs into the current Map, which is easy to understand. The first two involve two two-parameter initialCapacity and loadFactor. What do they mean? We need to take a look at how HashMap works (based on JDK 7).

Realize the principle of

An internal

Inside a HashMap there are several main instance variables:

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
transient int size;
int threshold;
final float loadFactor;
Copy the code

Size indicates the actual number of key-value pairs.

Table is an array of Entry types, where each element points to a one-way linked list. Each node in the linked list represents a key-value pair. Entry is an inner class, whose instance variables and constructor code are as follows:

static class Entry<K.V> implements Map.Entry<K.V> {
    final K key;
    V value;
    Entry<K,V> next;
    int hash;

    Entry(inth, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }}Copy the code

Where key and value represent the key and value respectively, next refers to the next Entry node, hash is the hash value of the key, and we will introduce its calculation method in a moment. Directly storing hash value is to speed up the calculation during comparison, and we will look at the code in a moment.

The initial value of table is EMPTY_TABLE, which is an empty table.

static finalEntry<? ,? >[] EMPTY_TABLE = {};Copy the code

When a key/value pair is added, the table is no longer empty and will be expanded as the key/value pair is added. The expansion strategy is similar to that of an ArrayList. When the first element is added, the default size allocated is 16.

Threshold indicates the threshold. If the number of key-value pairs size is greater than or equal to threshold, the value can be extended. How does threshold work out? In general, threshold is equal to table.length multiplied by loadFactor. For example, if table.length is 16 and loadFactor is 0.75, threshold is 12.

LoadFactor is a loadFactor that indicates how much of the table as a whole is occupied. LoadFactor is a floating point number that defaults to 0.75 and can be modified using the constructor.

Let’s take a look at some of the main method code to see how HashMap uses this internal data to implement the Map interface. Look at the default constructor first. It should be noted that we may omit some non-major code for clarity and simplicity.

Default constructor

The code is:

public HashMap(a) {
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
Copy the code

DEFAULT_INITIAL_CAPACITY is 16 and DEFAULT_LOAD_FACTOR is 0.75. The main constructor code for the default constructor call is:

public HashMap(int initialCapacity, float loadFactor) {
    this.loadFactor = loadFactor;
    threshold = initialCapacity;
}
Copy the code

Set the initial values of loadFactor and threshold.

Save key-value pairs

Let’s see how HashMap stores a key-value pair as follows:

public V put(K key, V value) {
    if (table == EMPTY_TABLE) {
        inflateTable(threshold);
    }
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key);
    int i = indexFor(hash, table.length);
    for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}  
Copy the code

If it is saved for the first time, the inflateTable() method is called to allocate actual space to the table. The main code of inflateTable is as follows:

private void inflateTable(int toSize) {
    // Find a power of 2 >= toSize
    int capacity = roundUpToPowerOf2(toSize);

    threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
    table = new Entry[capacity];
}
Copy the code

By default, the value of capacity is 16, the threshold changes to 12, and the table allocates an Entry array of 16 lengths.

Next, check to see if the key is null, and if so, call putForNullKey alone, which we’ll ignore for the moment.

If the key is not null, the hash method is used to calculate the hash value of the key. The code for the hash method is as follows:

final int hash(Object k) {
    int h = 0
    h ^= k.hashCode();
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code

Based on the return value of the Key’s own hashCode method, some bit operations are performed for the sake of randomness and uniformity.

Once you have the hash value, call indexFor to calculate where to place the key-value pair in the table as follows:

static int indexFor(int h, int length) {
    return h & (length-1);
}
Copy the code

In HashMap, length is a power of 2, and h&(length-1) is equivalent to the modulo operation: h%length.

Table [I] = table[I] = table[I] = table[I] = table[I]

for(Entry<K,V> e = table[i]; e ! =null; e = e.next) 
Copy the code

Equals equals equals equals equals equals equals equals equals equals equals equals equals

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
Copy the code

Why compare hash in the first place? Since hash is an integer, the comparison performance is generally much higher than equals, and unlike hash, there is no need to call equals, which improves comparison performance overall.

If it can be found, directly change the value in the Entry.

ModCount++, as described in ArrayList and LinkedList, records the number of changes to facilitate detection of structural changes during iterations.

If none is found, the addEntry method is called to add an entry at the given location with the code:

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);
        hash = (null! = key) ? hash(key) :0;
        bucketIndex = indexFor(hash, table.length);
    }

    createEntry(hash, key, value, bucketIndex);
}
Copy the code

If the space is sufficient and resize is not required, call createEntry to add the space.

void createEntry(int hash, K key, V value, int bucketIndex) {
    Entry<K,V> e = table[bucketIndex];
    table[bucketIndex] = new Entry<>(hash, key, value, e);
    size++;
}
Copy the code

The code is straightforward: create an Entry object, insert the head of the one-way list, and increase size.

If the space is insufficient, that is, size is about to exceed threshold and objects have been inserted into the corresponding table position, the specific check code is as follows:

if ((size >= threshold) && (null! = table[bucketIndex]))Copy the code

Then call the resize method to expand the table. The expansion strategy is multiplied by 2. The main code of resize is as follows:

void resize(int newCapacity) {
    Entry[] oldTable = table;
    int oldCapacity = oldTable.length;
    Entry[] newTable = new Entry[newCapacity];
    transfer(newTable, initHashSeedAsNeeded(newCapacity));
    table = newTable;
    threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code

Allocate an Entry array with twice the size, call the Transfer method to migrate the original key-value pair, and then update the internal table variable and the value of threshold. The code of transfer method is:

void transfer(Entry[] newTable, boolean rehash) {
    int newCapacity = newTable.length;
    for (Entry<K,V> e : table) {
        while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                e.hash = null == e.key ? 0 : hash(e.key);
            }
            inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

The rehash argument is generally false. This code iterates over each of the original key-value pairs, computes the new location, and saves it to the new location.

The above is the main code for saving key-value pairs. Briefly summarize the basic steps as follows:

Compute the hash value of the key
According to the hash value to save position (mold)
Insert into the head of the list at the corresponding position or update an existing value
Expand the table size as needed

The above description may be abstract, but let’s take a look at it using an example and a diagram. The code is:

Map<String,Integer> countMap = new HashMap<>();
countMap.put("hello".1);
countMap.put("world".3);
countMap.put("position".4);
Copy the code

After creating an object with new HashMap(), the graphic structure in memory looks something like this:

countMap.put("hello".1);
Copy the code

The hash value of “hello” is 96207088 and the result of module 16 is 0, so insert the head of the linked list pointed to by table[0] and the memory structure will change to:

Get values based on keys

The code is:

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    Entry<K,V> entry = getEntry(key);

    return null == entry ? null : entry.getValue();
}
Copy the code

GetForNullKey (); getEntry(); getForNullKey(); The node’s getValue() method is then called to get the value. The code for the getEntry method is:

final Entry<K,V> getEntry(Object key) {
    if (size == 0) {
        return null;
    }

    int hash = (key == null)?0 : hash(key);
    for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
         e = e.next) {
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
            return e;
    }
    return null;
}
Copy the code

The logic is simple:

Compute the hash value of the key as follows:

int hash = (key == null)?0 : hash(key);
Copy the code

Find the corresponding linked list in the table according to the hash code:

table[indexFor(hash, table.length)];
Copy the code

Iterating through the list to find, iterating through the code:

for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
       e = e.next)
Copy the code

Compare one by one, using hash to compare quickly, using equals to compare the same hash.

if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
Copy the code

Check to see if a key is included

ContainsKey has the same logic as get. If the node is not null, it exists.

public boolean containsKey(Object key) {
    returngetEntry(key) ! =null;
}
Copy the code

Check to see if a value is included

A HashMap can be used to efficiently manipulate by key, but if you want to manipulate by value, you need to iterate. The containsValue method has the following code:

public boolean containsValue(Object value) {
    if (value == null)
        return containsNullValue();

    Entry[] tab = table;
    for (int i = 0; i < tab.length ; i++)
        for(Entry e = tab[i] ; e ! =null ; e = e.next)
            if (value.equals(e.value))
                return true;
    return false;
}
Copy the code

If the value is null, containsNullValue is called separately. If the value is not null, containsNullValue is called separately. If the value is not null, containsNullValue is called separately.

Delete key-value pairs by key

The code is:

public V remove(Object key) {
    Entry<K,V> e = removeEntryForKey(key);
    return (e == null ? null : e.value);
}
Copy the code

The code for removeEntryForKey is:

final Entry<K,V> removeEntryForKey(Object key) {
    if (size == 0) {
        return null;
    }
    int hash = (key == null)?0 : hash(key);
    int i = indexFor(hash, table.length);
    Entry<K,V> prev = table[i];
    Entry<K,V> e = prev;

    while(e ! =null) {
        Entry<K,V> next = e.next;
        Object k;
        if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
            modCount++;
            size--;
            if (prev == e)
                table[i] = next;
            else
                prev.next = next;
            e.recordRemoval(this);
            return e;
        }
        prev = e;
        e = next;
    }

    return e;
}
Copy the code

The basic logic is:

Calculate the hash and find the table index based on the hash. The code is as follows:

int hash = (key == null)?0 : hash(key);
int i = indexFor(hash, table.length);
Copy the code

Traverse the table[I] to find the point to be deleted, and use the variable prev to point to the previous node, next to point to the next node, and e to point to the current node. The traversal structure code is as follows:

Entry<K,V> prev = table[i];
Entry<K,V> e = prev;
while(e ! = null) { Entry<K,V> next = e.next;if{// deletereturn;
    }
    prev = e;
    e = next;
}
Copy the code

To determine whether or not we find the hash, we still compare the hash to equals
The logic of deleting is to make the length smaller and then connect the nodes before and after the point to be deleted. If the point to be deleted is the first node, then make the table[I] point directly to the next node. The code is:

size--;
if (prev == e)
    table[i] = next;
else
    prev.next = next;
Copy the code

e.recordRemoval(this); The code is empty in HashMap, mainly for use by subclass extension of HashMap.

Summary of implementation principle

The above is the basic implementation principle of HashMap. There is an array table inside, and each element table[I] points to a one-way linked list. According to the key memory value, hash is calculated with the key, and the index position in the array buketIndex is obtained by modulo. Then operate on the one-way linked list pointed to by table[buketIndex].

Hash () = equals (); hash () = equals (); hash () = equals (); hash () = equals (); We need to pay special attention to this point. This is also a key constraint on the hashCode and Equals methods, a constraint we discussed when we introduced wrapping classes.

Analysis of HashMap features

HashMap implements the Map interface, internally using array linked list and hash, which determines its characteristics as follows:

The efficiency of saving and obtaining values by key is very high, which is O(1). Each one-way linked list usually has only one or a few nodes, which can be directly and quickly located according to the hash value.
There is no order in key-value pairs in a HashMap because the hash values are random.

A HashMap is ideal if you often need to value by key, and you don’t need the order.

summary

This section introduces the usage and implementation principle of HashMap. It realizes the Map interface and can conveniently store values by keys. Its implementation uses hash and can directly locate according to the keys themselves, with high access efficiency.

Accessing and comparing objects according to hash value is an important way of thinking in computer program. It makes the accessing objects mainly depend on their own hash value, rather than comparing with other objects. The accessing efficiency is independent of the size of set, up to O(1).

However, HashMap has no order. If you want to maintain the order in which you add HashMap, you can use a subclass of HashMap called LinkedHashMap, which we’ll cover later. Map also has an important implementation class, TreeMap, that can sort, which will be covered in a later section.

This section mentions the Set interface, and in the next section, we’ll explore one of its important implementation classes, HashSet.

To be continued, check the latest articles, please pay attention to the wechat public account “Lao Ma said programming” (scan the qr code below), simple and simple, Lao Ma and you explore the nature of Java programming and computer technology. All rights reserved.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Thinking Logic for computer programs (40) – Dissecting HashMap

The Map interface

The basic concept

The interface definition

HashMap

Using the example

A constructor

Realize the principle of

An internal

Default constructor

Save key-value pairs

Get values based on keys

Check to see if a key is included

Check to see if a value is included

Delete key-value pairs by key

Summary of implementation principle

Analysis of HashMap features

summary

Thinking Logic for computer programs (40) – Dissecting HashMap

The Map interface

The basic concept

The interface definition

HashMap

Using the example

A constructor

Realize the principle of

An internal

Default constructor

Save key-value pairs

Get values based on keys

Check to see if a key is included

Check to see if a value is included

Delete key-value pairs by key

Summary of implementation principle

Analysis of HashMap features

summary

Related Posts

Algorithms of the Day – Linked List (1) : How to implement LRU cache elimination algorithm?

3 Minutes essay: Unselfish Sharing! Laravel model USES two small skills | the nuggets essay – double festival special articles

Design Pattern 22- Prototype pattern