The introduction

Alas!

Gold, nine, silver, ten seconds passed,

Being overwhelmed at the interview;

It’s my fault I don’t know how it works,

Make up for today only for over.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Elementary brother, elementary brother!

If you don’t work hard,

All learn rhyme;

Looking for a job recently,

Everything is not to be desired;

I’m here to give you a principle paper,

Make sure you don’t have to worry about HashMap.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Without further ado, let’s start explaining how HashMap works.

The body of the

The Hash table

A HashMap is a data structure that uses Hash tables. If you know your way around Hash tables, I’m sure you’re halfway there with HashMaps. Let’s talk about what a Hash table is first.

Here’s an example:

When we went to school, we all stood in line. When the teacher did not make requirements, we would stand together casually. There were no uniform rules, and even the behavior of class jumping. When a teacher starts looking for a man in a long line, he usually starts looking first until he finds the man he is looking for. You can think of this as a linked list or an array, and you're going to iterate through it one at a time. At this time teaching director came, said that such queuing is too messy, so formulated the queuing rules, must be in accordance with the grade + class + their own serial number order standing in line and remember this serial number. When the teacher again looking for a strange students, as long as the director of the teaching said I want to find two grade five class 006 students, director of the teaching information can directly find zhang SAN students standing position, do not need to traverse. This method is used to add or find Hash tables.Copy the code

Let’s explain the above example:

  • In the example, class 006, Grade 5, Grade 2, is the Hash value: a fixed length target value is computed based on the key value (a student in the example), and this target value is called the Hash value;

  • The queue rule formulated by the dean is a Hash function formulated: how to calculate the Hash value with the key? It’s the calculation of the Hash function; The implementation of each hash function is different in a project, and the implementation of most hash functions is not public because of possible security issues.

To put it simply, a hash table is a table structure that allows us to directly calculate the target position based on a given key value. In engineering, this table structure is usually implemented using arrays.

So what makes a good Hash function?

  • 1. Identity: also called hashing, that is, every Hash value should be as uniform as possible; Because when you store it in an array you want to store it as evenly as possible. (It can also be said that the hit ratio of each array subscript is as equal as possible.)

  • 2. Follow the avalanche effect: Hash values change dramatically when small keys are entered; Some Hash functions are used for encryption (such as HTTPS) to ensure greater security.

  • 3, try to avoid Hash collisions: when we want to put ten apples in nine drawers, there will always be at least two apples in the same drawer (drawer principle), there are two apples in the drawer, we can call it Hash collision, or Hash collision; Since the length of the Hash value is fixed and the key can be any value, the Hash value of two keys will always be the same, which is called Hash conflict. Hash conflict is inevitable, but we should minimize conflicts.

Now let’s see if we can make fun of the Hash function created by the teaching director in our example.

Therefore, the query speed of Hash table is fast. The time complexity of a good Hash table can reach O (1), which is very amazing for the huge amount of data.

A HashMap principle

JDK1.7

Let’s look at the implementation of HashMap in JDK1.7:

In JDK1.7, the HashMap is stored in the form of an array plus a linked list. The position of each subscript in the array can be called a hash bucket.

Put the process:

public V put(K key, V value) {
        if (key == null) / / 1
            return putForNullKey(value);
        int hash = hash(key); / / 2
        int i = indexFor(hash, table.length); / / 3
        for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {/ / 4
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i); / / 5
        return null;
    }
Copy the code

1. Key cannot be empty:

Before storing an element, the HashMap determines whether the element’s key is empty and returns it if it is.

2. Compute the Hash value:

    final int hash(Object k) {
        int h = 0;
        if (useAltHashing) {
            if (k instanceof String) { / / 2.1
                return sun.misc.Hashing.stringHash32((String) k);
            }
            h = hashSeed;
        }

        h ^= k.hashCode(); 

        / / 2.2
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
Copy the code

Using the hash source we can see:

2.1, where JDK1.7 rehashes elements of type String. Why should strings be rehashed?

Let’s start with an email… After JDK1.7 came out, Tomcat posted an email about a potential security vulnerability in HashMap in JDK1.7.

Mail-archives.apache.org/mod_mbox/to… 4 [email protected]

In this email, Tomcat uses Hashtable to store HTTP request parameters. As the hash algorithm of String is public, it may cause some professionals to obtain countless key values with the same HashCode. When these request parameters are requested simultaneously, The HASH table is degraded to a linked list and the CPU has a large number of linked lists. Due to the time complexity O(n) of the linked list search, the search speed is greatly affected and DDos attacks occur.

Of course, Tomcat also provided a solution, and this bug was later fixed in JDK1.7.

2.2, we see a lot of >>> and ^ operations at 2.2, the reason is to increase its identity, so that the probability of hitting each array index is the same.

3, obtain array subscript:

    static int indexFor(int h, int length) {
        return h & (length-1);
    }
Copy the code

In step 2, we got the Hash value, so how do we use the Hash value to determine which bucket to put in?

Some of us may think of the mod operation, but the mod operation has two disadvantages: 1, compared to Java & operation is relatively slow; 2. The mod of a negative number is still negative, and the array subscript (bucket location) cannot be negative.

So Java uses bitwise and, which allows you to calculate the location of the element bucket based on the array length and Hash value.

If length is not a power of 2, then the zero bit will always be zero every time you press the sum bit, resulting in some subscripts that will never be obtained, as shown in the figure below.

To solve this problem, if your value is not a power of 2, then in the constructor HashMap will find the lowest power of 2 greater than or equal to the initial capacity we assigned to the HashMap. So to reduce this conversion, we must initialize the capacity of the HashMap to a power of two.

4, let’s move on to comment 4: this is a circular linked list to insert elements.

5. We can see that addEntry() is called. What does this function do?

    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);
            hash = (null! = key) ? hash(key) :0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }
Copy the code

So we can kind of get a sense of how this is actually a capacity expansion mechanism, but that’s all we need to know, and then we’ll talk about how it works.

Threshold: indicates the broad value. If the used size of the array is greater than or equal to this value, the array will be expanded. Threshold = capacity (array capacity) * load_factor

Load_factor: The default value is 0.75, which is a tradeoff between time and space costs, according to the official explanation.

Once you get into if, you execute the resize() function, which is arguably responsible for everything.

Let’s hit resize()

    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        if (oldCapacity == MAXIMUM_CAPACITY) { //MAXIMUM_CAPACITY defaults to the 30th power of 2, so it will rarely reach this capacity
            threshold = Integer.MAX_VALUE;
            return;
        }

        Entry[] newTable = new Entry[newCapacity];
        boolean oldAltHashing = useAltHashing;
        useAltHashing |= sun.misc.VM.isBooted() &&
                (newCapacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
        boolean rehash = oldAltHashing ^ useAltHashing;
        transfer(newTable, rehash);
        table = newTable;
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }
Copy the code
    void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code

These two methods mainly rehash the array and insert the list back into the new array using headers. This is fine in a single thread, but if multiple threads are working at the same time, the list will loop. When the looped list is accessed again, an infinite loop, cpu100 percent, occurs, and even though the HashMap explicitly states that it is not thread-safe, someone still uses it, causing such a problem.

  • Here’s an article that explains exactly why circular lists are created: coolshell.cn/articles/96…

This is the PUT process for HashMap in JDK1.7. Let’s talk briefly about the GET process:

    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }
Copy the code

This is JDK1.7’s get method, which returns null if key is empty, otherwise getEntry() is called:

    final Entry<K,V> getEntry(Object key) {
        int hash = (key == null)?0 : hash(key);
        for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
             e = e.next) {
            Object k;
            if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
                return e;
        }
        return null;
    }
Copy the code

Compute the hash value of the key and loop through the linked list under the bucket to find the data.

JDK1.8

As mentioned above, the JDK1.7 HashMap has security concerns, such as the possibility of loiterable lists in multi-threading (this is entirely the user’s own pot, as the Java team explicitly states that HashMap is not thread-safe).

  • Therefore, JDK1.8’s HashMap uses the data structure of array plus linked list plus red-black tree to prevent the search time from degrading to O (n) when the linked list is too long.

The list will be replaced by a red-black tree when it reaches 8. The reason for choosing 8 is that the Java team explained:

Basically, they use the poisson distribution principle (which we’ve all learned in college mathematics), with only 0.00000006 probability when the chain is expressed at 8.

  • Improvements have also been made to the hash() function:
    static final int hash(Object key) {
        int h;
        return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
    }
Copy the code
  • Another obvious improvement is in theresize()In this function, the header method is changed to the tail method, so that in the case of multiple threads, there is no loop linked list, resulting in an infinite loop. It is worth noting, however, that HashMap is still not thread-safe, as is clearly stated in the HashMap annotation.

The end of the

This is an important part of the HashMap, and it is a common question asked during the interview process. If there’s anything else I haven’t mentioned, let me know in the comments and I’ll try to fill it in.

I also want to share a video explaining HashMap, which I think is quite clear:

A HashMap video

That’s all for this article, please don’t forget the triple link!

. This is not station B!