If you’re a Java programmer, you’re probably familiar with Hashmaps, and it just so happens that every time you go to an interview, you’ll be asked about hashmaps

Then you should definitely read this article

The underlying data structure of the HashMap

The underlying data structure of HashMap is somewhat different from that of HashMap 1.8, but it is mostly in the form of arrays and linked lists. 1.7 looks like this:

Version 1.8 looks like this:

You can obviously see that in 1.8 there is an extra tree. Red and black?

What are the shortcomings of the 1.7 implementation of HashMap

The 1.7 version is mainly array + linked list, so if there is always a hash value collision, then the corresponding linked list structure will become longer and longer, so if you want to query the operation, it will be very time-consuming, so how to optimize this is 1.8 version to achieve

Version 1.8 uses the array + linked list + red black tree to achieve, when the length of the list is greater than 8, the list will be red black tree.

So the question is, why is the value of the list to red black tree set to 8?

Because the time of the linked list is n over 2, and the time of the red-black tree is logn, when n is equal to 8, log8 is smaller than 8 over 2, so red-black trees are faster to find

Why is it less than 6 and not 7? Going from a linked list to a red-black tree, and then from a red-black tree to a linked list, that’s going to be expensive, especially if you’re going from a linked list to a red-black tree, you’re going to have to rotate

Why turn linked lists into red-black trees instead of balanced binary trees (AVL trees)?

  • Because AVL trees maintain a stricter balance than red-black trees, the path from root to deepest leaf in AVL trees is at most 1.44lg (n + 2), and that in red-black trees is at most 2lg (n + 1), so the search effect of AVL trees is faster. If the search is intensive, it is better to use AVL trees. In contrast, for insertion intensive tasks, using red-black trees is nice
  • AVL trees store balance factors at each node
  • Rotations of AVL trees are much more difficult to balance and debug than rotations of red-black trees. If both are looked for by O (LGN), AVL trees may require O (log n) rotations, while red-black trees require up to two rotations to balance

Why is HashMap thread unsafe?

Thread insecurity of HashMap is mainly reflected in two aspects: infinite loop caused by capacity expansion & data overwrite

This problem will only occur in version 1.7 or earlier, because in version 1.7 or earlier, the implementation of capacity expansion was implemented using the header method, which causes the problem of circular linked lists

When will the expansion be triggered? Capacity expansion occurs if the stored data is greater than the current HashMap length (Capacity) x LoadFactor (0.75 by default). For example, the current capacity is 16, 16 * 0.75 = 12, when the 13th element is stored, it is judged that it needs to be expanded, then the HashMap will be expanded first

Expansion is also not a matter of simply expanding the original capacity. During expansion, a new empty array of entries is created, which is twice the length of the original array. After expansion, ReHash is performed, that is, ReHash the data in the original array into the new array

Suppose we now have an Entry array of size 2, then when we insert the second element, it is greater than 2 * 0.75, then the expansion will occur, as shown in the following figure:

After the expansion is complete, the following elements will be placed in the header position because of the head insertion method, so it might look like this:

B = A; B = A; B = A;

It was obvious that this was causing an endless loop, but the good news is that after version 1.8, plugins were used to solve this problem

There is one issue that 1.8 does not address, and that is data coverage

Let’s say thread A and thread B both perform A put operation at the same time. The hash value of the two data points is the same, and the value of the data point is null. Let’s say thread A is suspended when it wants to insert data, and thread B inserts data normally, and thread A gets the CPU time slice and starts inserting data, and then thread B overwrites the data

Since HashMap does not lock the PUT operation, there is no guarantee that the value that the next thread gets will not have been modified, so HashMap is not safe

Since HashMap threads are not secure, would you recommend a secure one?

ConcurrentHashMap is recommended. One interesting thing about ConcurrentHashMap is that the 1.7 and 1.8 implementations of ConcurrentHashMap are different

In version 1.7, ConcurrentHashMap uses ReentrantLock + Segment + HashEntry, which is to divide a HashMap into segments and assign a lock to each Segment. This supports access in multi-threaded environments. But the granularity of this lock is too large, because you lock directly is a section

Therefore, version 1.8 has been optimized to use CAS + synchronized + Node + red-black tree, which reduces the granularity of the lock and synchronizes the lock. Compared with ReentrantLock, it can save more memory space

The HashMap can be extended to include the difference between HashMap and HashTable, the implementation of ConcurrentHashMap 1.7 and 1.8, etc

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

  1. Like, forward, have your “like and comment”, is the motivation of my creation.

  2. Follow the public account “Java rotten pigskin” and share original knowledge from time to time.

  3. Also look forward to the follow-up article ing🚀

  4. [666] Scan the code to obtain the learning materials package