1. Data structure of HashMap

HashMap<String,String> map=new HashMap();
map.put("1"."Kobe");
Copy the code

These two lines of code indicate that the data has been stored in the HashMap. This raises the question, how can data be stored efficiently in a HashMap?

Starting with this question, we should first look at the underlying data structure of the HashMap.

HashMap: array + linked list + red-black tree JDK1.8

We all know that a HashMap is a container for storing key and value pairs. Should we put a key,value or both in each cell?

If you have seen the source code, you will know that there is an object-oriented idea, the [key,value] encapsulation

class Node{
	private String key;
	privateA String value. }Copy the code

Each cell is a new Node, and if you want to implement them, you only need to modify the Node.

Node[] table=new Node[24];  // represents an array
Copy the code
class Node{
	private String key;		// indicates a single necklace list
	privateA String value. Node next; }Copy the code
class TreeNode entends Node{     // A pseudo-code representation of a red-black tree
	parent;
	left;
	right;
}
Copy the code

Hash functions and collisions

When we retrieve the data and store it into the HashMap, we need to determine the position of the Node object consisting of the key and value in the index subscript of the array.

If you want to get the location, you need to:

  • Array length length
  • Get an integer [0 —- length-1]

(1) We may first think of using random.nextint (length); But then there are two problems:

  1. There’s too much chance of random repetition
  2. There was no basis for the search

(2) In this case, hashCode comes into play:

  1. I get an integer
intHash = key. HashCode () - >32bit0and1If we use an example to express:1HashCode may exceed the storage rangeCopy the code
  1. Controls the range of the integer
In this case, you need to control the hash value range of the integer: hash%length = required rangeCopy the code

But even this can cause problems. The hash = key. HashCode (); If the key value is 31,47 and so on, the hash%16 result will always be 1, so the Node object is more likely to go to the same location, and the storage resources will be wasted.

To make the result of index as non-repetitive as possible, you need a different form of calculation: hash & Length-1

The result is also between 0 and 15, which is the same as the result of modulo operation.

But even then, different hashes might produce the same index:In this case, xOR is required for the lower 16 bits and the higher 16 bits of the original hash value:

Hash function: Perform an xor operation 16 bits higher and 16 bits lower on key.hashcode (), resulting in a hash value that is much less likely to duplicate the last few bits.

####hash collision In hash&(n-1), the index result is a collision if it is repeated.Insert a picture describing the form OP2: Length-1 — > 01111. If it is not of this form, the final result of the calculation will be 0 regardless of the last value of OP1, which increases the probability of repetition.

Index actually depends on OP1, because op2 has 1 bits except for the first one, which means that the array size must be 10000-1=0111 (a power of 2).

3. The process of PUT

When we create a HashMap, we need to put, or store data into it. To put, you need to hash the key value, which is used to determine the position later. After completing the hash function and maintaining a few variables, you can begin the concrete PUT process.

  1. Check if the Node array is initialized. If not, it needs to be initialized.
if ((tab = table) == null || (n = tab.length) == 0)
	n = (tab = resize()).length;
Copy the code

The resize() method is initialized

newCap = DEFAULT_INITIAL_CAPACITY;		// Default array size is 16
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);		//16 x 0.75=12 Capacity expansion standard

Node<K,V> newTab = (Node<K,V>[])new Node[newCap];		// Initialize the array
Copy the code

2. Calculate the subscript position of the Node based on the hash result of the hash function and start saving data.

If the subscript position of the calculated Node is 1, check whether there is a Node at 1. If not, create a Node object and place it in the array. If there is an element in position 1, it can be divided into three situations: (1) If the key value is the same, the value value is directly replaced. (2) If the key value is different, the value is stored in a linked list. (3) If the key value is different, the value is stored in a red-black tree

else {
	Node<K,V> e; K k;
	// If the key value is the same, replace the value directly
	if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
	e=p;
	// If the key value is different, it is stored in a linked list
	else if (p instanceof TreeNode)
		e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
	else {
		————> loop through the current list until the last Node in the current list is found, next==null, place the new Node after the last Node
		for (int binCount = 0; ; ++binCount){
			if ((e = p.next) == null){
				p.next = newNode(hash, key, value, null);
				// Whenever a new node is added, check whether the length is greater than 8
				if (binCount >= TREEIFY_THRESHOLD - 1)
					// List to red black tree
					treeifyBin(tab, hash);
				break;
			}
			if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
			break; p = e; }}if(e ! =null) {
		V oldValue = e.value;
		if(! onlyIfAbsent || oldValuenull)
			e.value = value;
		afterNodeAccess(e);
		returnoldValue; }}Copy the code

Note: If the list is longer than 8, it becomes a red-black tree; If the nodes in a red-black tree are less than 6, it becomes a linked list.

4. Expansion of HashMap

When the size of the array is insufficient for storage, the HashMap needs to be expanded.

The method of expansion is to create a new array and migrate the [linked list, red-black tree] from the old array to the new array.

Note: Ensure that the capacity expansion is a multiple of 2, for example, 16 – > 32, in accordance with the law of the power of 2.

And under what circumstances does that happen?

If the array size is 16, expansion occurs when the number of nodes in the entire data structure exceeds 16*0.75=12.

// 0.75 is the load factor
static final float DEFAULT_LOAD_FACTOR = 0.75 f;

if (++size > threshold)		// The threshold is 16*0.75
	resize();				// Function: initialize/expand

// MAXIMUM_CAPACITY is 2^30. If the old array is larger than this, there is no need to expand
if (oldCap >= MAXIMUM_CAPACITY) {
	threshold = Integer.MAX_VALUE;
	return oldTab;
}

// If not, shift the size of the old array one bit to the right
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY)
	// The new array is used, so the expansion standard is doubled to 24
	newThr = oldThr << 1;

Copy the code

When these two parameters are obtained, a new array can be created:

Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
Copy the code

Then we need to migrate the nodes of the old array to the new array:

  1. Loop over the indices of the old array
  2. Determine if there is an element at the current subscript position. If there is an element, it is worth migrating
  3. If there’s an element at the subscript position, and there’s no element below it
  4. If I have an element down here, and it’s a red-black tree
  5. If I have an element down here, and it’s a linked list
if(oldTab ! =null) {
	// Loop over the index of the old array
	for(int j = 0; j < oldCap; ++j) { Node<K,V> e;// Determine if there are elements at the current subscript position
		if((e = oldTab[j]) ! =null) {
			oldTab[j] = null;
			// If there are elements at the subscript position and there are no elements below
			if (e.next == null)
				// Get the location of Node's new array index
				newTab[e.hash & (newCap - 1)] = e;
			else if (e instanceof TreeNode)
				// If there is an element below, and it is a red-black tree
				((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
				else {
					Node<K,V> loHead = null, lotail = null;
					Node<K,V> hiload = null,hiTail = null;
					Node<K,V> next;
					// If there are elements below, and it is a linked list
					do {
						next = e.next;
						// Nodes at position I in the old list are saved to the corresponding position I in the new array
						if ((e.hash &oldCap) == 0) {
							if (loTail == null)
								loHead = e;
							else
								loTail.next = e;
								loTail = e;
						}
						// Nodes at position I in the old list are saved to the corresponding position 1+16=17 in the new array
						else {
							if (hiTail == null)
								hiHead = e;
							elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
					if(loTail ! =null) {
						loTail.next = null;
						newTab[j] = loHead;
					}
					if(hiTail ! =null) {
						hiTail.next = null;
						newTab[j+oldCap] = hiHead;
					}
				}
		}
	}
}
Copy the code

Five, thread safety multithreaded execution operation and single thread execution operation, the final data is inconsistent, this is thread unsafe.

If you want to be thread-safe, you need atomicity, visibility, and order.

Method: No other thread can operate until this thread completes or exits unexpectedly. You can add the synchronized keyword to the put process. However, this would result in each thread having a lock, which would greatly reduce efficiency. You can use hashTable or ConcurrentHashMap, but I’m not going to do too much here

** Would like to recommend a Github, which has a lot of dry goods:Github.com/XiangLinPro…

Akik: Haha

Link: m6z. Cn / 6 s8byq