HashMap implementation principle step by step analysis (1- PUT method source code overall process)

Today I will share with you the internal implementation principle of HashMap, which is also a part of the basic part of the interview process that is asked a lot.

To understand the internals of a HashMap, we need to understand some basic concepts: what is a hash, what is a hash table, and what is a Hashcode? With these basic concepts in mind, it’s relatively easy to analyze a Hashmap.

What is a Hash

Hash is simply defined as converting an input of arbitrary length to an output of fixed length through a hash algorithm to establish a one-to-one correspondence. This output is commonly called a hash code or hash value.

Common Hash Algorithms

Here are some common hash algorithms in case you’re not quite sure:

Our MD5 encryption, for example, assume that your password is 1234 by MD5 encrypted 81 dc9bdb52d04dc20036dbd8313ed055 becomes a bunch of encryption. In the 1234 s and 81 dc9bdb52d04dc20036dbd8313ed055 is a corresponding relationship is established. 1234 future use MD5 encryption result is 81 dc9bdb52d04dc20036dbd8313ed055 fixed value, the value we called a hash value.

Another example is our ASSIC table, which is also A hash algorithm, where our character ‘A’ is mapped to the number 65. Our hash algorithm for ‘A’ always finds A fixed number 65, which we call the hash value.

The Hash table

A Hash table (also called a Hash table) is a data structure that is directly accessed based on Key values. That is, it accesses records by mapping key values to a location in the table to speed up lookups.

If we want to store multiple letters, store multiple values. In Java, you can do this using arrays. If we store the letters directly according to the array’s corner markers, we store them as follows.

So let’s say that we want to get the value of b, we’re going to have to go through each of these elements and see if it’s equal to b. So it’s less efficient.

Now you can use the hash value above to store the classic character. Each character has a Classic value. We can get the Classic value of the letter and store it. Put it in the corner where the assic value of the object is as shown below:

By storing in this way, we assume the content to be obtained. We just need to calculate the classic code value of A 97 and then directly take out the corresponding position of the word by arR [97]. There is no need to go through the groups again for comparison. Such an array is called a hash table, a data structure that is accessed directly based on Key values.

HashCode

If you’re wondering if I’m going to store the letter A, let’s say I only have 16 Spaces in my array and the range is 0 to 15. The classic code of letter A is 97. Won’t an exception occur if you store it directly? That’s true, so we don’t store the hash value as soon as we get it. Here we first take the remainder of the hash code value %16(16 is the length of our array) divided by 16, so that we can fix the range between 0 and 15,97%16 gets 1. We store the letter A in the position of 1 marker arr[1]= A and use the same method to get the classic code of A and then %16. The diagram below:

The number 1 computed in this is the corner marker stored in the array, and we can call it the hashcode of A, and hashcode is something that has a place in the hash table.

HashCode exists primarily for lookup speed, and HashCode is used to determine the storage address of an object in a hash storage structure.

The hashCode method in the Object class

In the Java language, every time the JVM creates a new Object, it drops it into a Hash table, so that the next time it compares or fetches an Object, it fetches it from the Hash table based on the Object’s Hashcode. The purpose of this is to improve the efficiency of fetching objects. Object objects have a special method: hashCode (). This method gets the corresponding hashCode value.

Internal data structure of HashMap:

With this knowledge in hand, let’s look at the implementation of HashMap. The internal implementation of HashMap has undergone a major change in JDK1.7 and 1.8.

The data structure used in JDK1.7 is: array + linked list

The data structures used in JDK1.8 are: array + linked list + red-black tree

In JDK1.7,1.8 is basically a red-black tree.

Implementation principle of HashMap

Let’s take a look at the basic use of hashMap, as shown below:

In the put method, the first parameter is key(key must be a reference data type) and the second parameter is value. This set of key: values is called Entry, which corresponds to an inner class in hashMap, as shown below:

Every time we put a set of keys and values, we’re given an Entry object. Place the created entry object into the array. In the hashMap source code, we can see the following properties, which define the initial size of the array and the empty array.

Our Entry objects will be placed in the table array as shown in the following figure.

When you put an element, it checks whether the array is empty and creates it if it is.

Line 26 helps us initialize the array. From the source of the PUT method, we can see that the internal array is lazily loaded. Let’s go into that method and look at it

When an Entry object is stored in an array, it is not stored directly as a corner marker. Storage is in the form of hash tables.

To put an element, the key of the entry pair must have a reference data type. The method hashCode () can be called when a reference data type is used. Returns a string of numbers of type int. It’s an object location that’s stored as a hash value in the array. This is a large string of numbers, and the array has only 16 bytes of storage space. Therefore, the result is limited to 0 to 15 by modulating %16. The process is shown as follows:

A hash collision may occur during a stored procedure

What is a hash collision

The so-called hash collision refers to the calculation of the same hash code after the hash algorithm, as shown in the following figure:

In hashMap, a linked list is introduced to address this situation, which means that when we look at the Entry inner class declaration above, there is a next attribute. If a hash conflict occurs, the corresponding data is recorded in the form of a linked list header.

For example, suppose that the location of corner 9 is now occupied by the zhang SAN entity, which is the first location placed on corner 9

Next time I’m going to store That, let’s say that if I hash that, it’s going to be 9

Linked lists are used to resolve hash conflicts

If a hash conflict occurs later, the list is iterated to see if there is the same entry object. If there is, the list is updated. If not, the object is inserted into the head of the list. If no hash conflict is generated, directly save the hash to the corresponding corner.

Put ();

This article introduces the content of hashmap, there are many more content, this article mainly to the internal hashmap array + linked list have a global understanding. There are a lot of problems we haven’t analyzed yet. For example: why the length of the array must be 2 power, the problem of expansion when adding elements, in 1.7 expansion to multi-data, why the CPU usage may be 100% in the case of multi-threading? Red black tree added to JDk1.8 These will be analyzed one by one next time.

HashMap implementation principle step by step analysis (1- PUT method source code overall process)

What is a Hash

Common Hash Algorithms

The Hash table

HashCode

The hashCode method in the Object class

Internal data structure of HashMap:

Implementation principle of HashMap

What is a hash collision

Related Posts

What is the difference between user mode and kernel mode

Simulate a simple shell

What is LXCFS? LXCFS implements the best practice of isolating container resource views