GitHub JavaKeeper, N line Internet development essential skills weapon spectrum, notes taken from
As a side job interviewer, I’m sure I’ll be asked about Java collections during the interview process, and as a candidate, I’m sure I’ll be asked about Java collections, so here are some of the questions
What are some common sets?
HashMap does the Key need to override hashCode() and equals()?
Can keys and values be null in a HashMap? How many nulls are allowed?
Is the HashMap thread safe? What is the difference between ConcurrentHashMap and hashTable?
List and Set. Now I have an ArrayList, and I sort all the elements by some attribute size. What do I do?
Difference between an ArrayList and a Vector
Can you delete a list? Can you delete a list while iterating? Why
Object oriented languages reflect things in the form of objects, so in order to facilitate the operation of multiple objects, objects need to be stored, collection is the most common way to store objects, also called containers.
As you can see from the collection framework diagram above, the Java Collections framework consists of two main types of containers
- One is a Collection, which stores a Collection of elements
- The other is a Map, which stores key/value pair mappings.
The Collection interface has three subtypes, List, Set, and Queue, followed by some abstract classes, and finally the concrete implementation classes. Common examples are ArrayList, LinkedList, HashSet, LinkedHashSet, HashMap, LinkedHashMap, and so on.
The collection framework is a unified framework for representing and manipulating collections. All collections frameworks contain the following:
-
Interface: Is an abstract data type that represents a collection. For example, Collection, List, Set, and Map. Multiple interfaces are defined to manipulate collection objects in different ways
-
Implementation (class) : Is a concrete implementation of the collection interface. In essence, they are reusable data structures, such as ArrayList, LinkedList, HashSet, HashMap.
-
Algorithms: Useful calculations performed by methods in objects that implement the collection interface, such as searching and sorting. These algorithms are called polymorphic because the same method can have different implementations on similar interfaces.
So what are the common sets?
The Map and Collection interfaces are the parent interfaces of all collections frameworks:
- The subinterfaces of the Collection interface include Set, List, and Queue
- List is an ordered Collection that allows repeating elements. The main implementation classes are ArrayList, LinkedList, Stack, and Vector
- Set is an unordered Collection without repeating elements. The main implementation classes are: HashSet, TreeSet, LinkedHashSet, etc
- Map does not inherit the Collection interface. Map provides key-to-value mapping. The main implementation classes are: HashMap, TreeMap, Hashtable, ConcurrentHashMap, Properties, etc
Difference between an ArrayList and a Vector
Similarities:
-
ArrayList and Vector both inherit the same parent class and implement the same interface (both implement lists, ordered, repeatable, and NULL)
extends AbstractList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable Copy the code
-
The bottom layer is the array (Object[]) implementation
-
The initial default length is 10
Difference:
-
Synchronization: Most of the public methods in Vector add the synchronized keyword to ensure that methods are synchronized, making Vector thread safe and ArrayList thread unsafe
-
Performance: The Vector has relatively poor performance due to synchronized lock waiting and the process of lock release
-
Size: ArrayList is 0.5 times larger when the underlying array is insufficient. Vector is 1 times larger by default
Expansion mechanism, expansion method is actually create a new array, and then copy the elements of the old array into the new array. The underlying expansion methods are all in grow() (based on JDK8)
-
ArrayList grows (). When the capacity expansion condition is met, ArrayList grows () by 1.5 times (oldCapacity >> 1, right move, equivalent to dividing by 2, result is 1/2 oldCapacity).
private void grow(int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length; //newCapacity = oldCapacity + O.5*oldCapacity, expand the capacity by 0.5 times int newCapacity = oldCapacity + (oldCapacity >> 1); if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); // minCapacity is usually close to size, so this is a win: elementData = Arrays.copyOf(elementData, newCapacity); } Copy the code
-
Grow () of Vector, which has one more property than ArrayList, capacityIncrement, which can be used to increase the size of a Vector. If the increment of expanded capacity is greater than 0, the length of the new array is the original array length **+ the increment of expanded capacity; otherwise, the length of the new array is 2** times the length of the original array
private void grow(int minCapacity) { // overflow-conscious code int oldCapacity = elementData.length; // int newCapacity = oldCapacity + ((capacityIncrement > 0)? capacityIncrement : oldCapacity);if (newCapacity - minCapacity < 0) newCapacity = minCapacity; if (newCapacity - MAX_ARRAY_SIZE > 0) newCapacity = hugeCapacity(minCapacity); elementData = Arrays.copyOf(elementData, newCapacity); } Copy the code
-
ArrayList is different from LinkedList
- Threadsafe: ArrayList and LinkedList are not synchronized, that is, not thread-safe;
- Underlying data structures: Arraylist uses Object arrays at the bottom; The underlying LinkedList uses a two-way circular LinkedList data structure;
- Whether insert and delete are affected by element position:
- Arraylists are arrays, so the time complexity of inserting and deleting elements depends on their location.For example: execute
add(E e)
Method, the ArrayList defaults to append the specified element to the end of the list, which is O(1) time. But if you want to insert and delete elements at position I (add(intindex,E element)
The time is O(n-i). Because all the (n-i) elements after the ith and ith elements in the set are moved backwards/forwards by one bit. - LinkedList is stored in a LinkedList, so the time complexity of inserting and deleting elements is not affected by the location of elements and is approximate, and the array is approximate.
- ArrayList is generally used when there are many queries but few inserts and deletions; if there are many inserts and deletions then LinkedList is recommended
- Arraylists are arrays, so the time complexity of inserting and deleting elements depends on their location.For example: execute
- Whether fast random access is supported: LinkedList does not support efficient access to random elements, whereas ArrayList implements the RandomAccess interface, so it does. Fast random access is the quick retrieval of an element object by its ordinal number (corresponding to
get(intindex)
Methods). - Memory space usage: The space waste of ArrayList is mainly reflected in the amount of space reserved at the end of the list, while the space cost of LinkedList is reflected in the amount of space consumed by each element (because of the storage of direct descendants and direct precursors and data) than ArrayList.
Senior engineer, I can not look at the source code, specific analysis:
-
Each time an ArrayList instance is created, it allocates an initial capacity (default is 10 if no initial capacity is specified). Take the add method as an example. If no initial capacity is specified, the add method checks whether the current array is empty. If empty, the array holding the object is assigned a minimum size, which is 10 by default. When adding large elements, the array size is increased first to improve the efficiency of adding.
-
LinkedList is an ordered and repeated collection of elements. The underlying LinkedList is based on a two-way list, with each node containing references to both its successors and its predecessors. Lists have no capacity limitation, but two-way lists themselves use more space and require additional list pointer operations. Accessing the element get(I)/set(I,e) by subscript moves the pointer into place (from the end if I > half the size of the array) to traverse the list tragically. Add (), addFirst(), add(), add(), RemoveLast () or remove() on iterator() can eliminate pointer movement. Additionally LinkedList implements the Deque (inherited from the Queue interface) interface, which can be used as a Queue.
Not all methods, just for learning, for recording ideas.
ArrayList and LinkedList both implement the List interface
public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess.Cloneable.java.io.Serializable{
Copy the code
public class LinkedList<E>
extends AbstractSequentialList<E>
implements List<E>, Deque<E>, Cloneable.java.io.Serializable
Copy the code
The constructor
ArrayList provides three constructors, one with no parameters, one with initial capacity, and one with set parameters
public class ArrayList<E> extends AbstractList<E>
implements List<E>, RandomAccess.Cloneable.java.io.Serializable{
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
// Create an array of initial capacity
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: "+initialCapacity); }}public ArrayList(a) {
// Default is empty array
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
public ArrayList(Collection<? extends E> c) { / /... }
}
Copy the code
LinkedList provides two constructors, and since LinkedList is based, there is no initializing size and no scaling mechanism, except to keep plugging in front and behind
public LinkedList(a) {}public LinkedList(Collection<? extends E> c) {
this(a); addAll(c); }// LinkedList must have nodes
private static class Node<E> {
E item;
Node<E> next;
Node<E> prev;
Node(Node<E> prev, E element, Node<E> next) {
this.item = element;
this.next = next;
this.prev = prev; }}Copy the code
insert
ArrayList:
public boolean add(E e) {
// Make sure the array is large enough to add the element
ensureCapacityInternal(size + 1); // Increments modCount!!
// Put the element into the array
elementData[size++] = e;
return true;
}
private void ensureCapacityInternal(int minCapacity) {
// If the array is empty, the array is initialized
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
// DEFAULT_CAPACITY is 10, so the default array capacity is 10 when initialized by calling the default ArrayList constructor without arguments
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// Make sure the array has enough capacity. If not, call the grow method to expand it
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
// Specific method of capacity expansion
private void grow(int minCapacity) {
// The capacity of the current array
int oldCapacity = elementData.length;
// The new array is expanded to 1.5 times its original capacity
int newCapacity = oldCapacity + (oldCapacity >> 1);
// If the size of the new array is still smaller than the minimum required, set the size to the minimum required
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
MAX_ARRAY_SIZE = integer.max_value - 8
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
// Copy the elements into the new array
elementData = Arrays.copyOf(elementData, newCapacity);
}
Copy the code
Of course you can insert it at a given location, and there’s an overloaded method add(int index, E element)
public void add(int index, E element) {
// Check whether the index exceeds the range of the index
rangeCheckForAdd(index);
// This is the same operation as before, just to ensure that the array size is sufficient
ensureCapacityInternal(size + 1); // Increments modCount!!
// Move the specified position and the data following it back one bit
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
// Adds the element to the specified array location
elementData[index] = element;
// The size of the ArrayList changes
size++;
}
Copy the code
You can see that it is less efficient to move the element each time you insert the specified position.
Insert into LinkedList. Insert into LinkedList. Insert into LinkedList
private static class Node<E> {
E item;
Node<E> next;
Node<E> prev;
Node(Node<E> prev, E element, Node<E> next) {
this.item = element;
this.next = next;
this.prev = prev; }}Copy the code
public boolean add(E e) {
// Add elements directly to the end of the queue
linkLast(e);
return true;
}
void linkLast(E e) {
// Save the end of the list. Last is the global variable used to represent the end of the list
final Node<E> l = last;
// Create a new node for the element e
final Node<E> newNode = new Node<>(l, e, null);
// Set the new node to the end of the queue
last = newNode;
// If the end of the queue is empty, then the entire list is empty, and the new node is assigned to the head node
if (l == null)
first = newNode;
else
// A new node is generated after the old one
l.next = newNode;
// Number of nodes +1
size++;
modCount++;
}
public void add(int index, E element) {
// Check whether the index exceeds the index range
checkPositionIndex(index);
// If you append to the end, it is the same as add(E E)
if (index == size)
linkLast(element);
else
// If not, insert it in another position
linkBefore(element, node(index));
}
// The node method is called in the linkBefore method, similar to the binary lookup optimization
Node<E> node(int index) {
// assert isElementIndex(index);
// If index is in the first half, go back to get node
if (index < (size >> 1)) {
Node<E> x = first;
for (int i = 0; i < index; i++)
x = x.next;
return x;
} else {
// If index is in the last half, traverse backwards to get node
Node<E> x = last;
for (int i = size - 1; i > index; i--)
x = x.prev;
returnx; }}void linkBefore(E e, Node<E> succ) {
// assert succ ! = null;
// Save the front node of the index node
final Node<E> pred = succ.prev;
// Create a new destination node
final Node<E> newNode = new Node<>(pred, e, succ);
succ.prev = newNode;
// If it is inserted at the beginning
if (pred == null)
first = newNode;
else
pred.next = newNode;
size++;
modCount++;
}
Copy the code
To obtain
ArrayList’s get() method is very efficient because it simply returns the element at the specified location in the array
public E get(int index) {
// Check whether the index exceeds the range of the index
rangeCheck(index);
// Returns the element at the specified position
return elementData(index);
}
Copy the code
The LinkedList get() method internally calls the node() method seen above, checks whether it is in the first half of the LinkedList or the second half of the LinkedList.
public E get(int index) {
checkElementIndex(index);
return node(index).item;
}
Copy the code
The underlying implementation of HashMap
When will HashMap be used? What is it about him?
Do you know how HashMap works?
Do you know how get and put work? What do equals() and hashCode() do?
Do you know the implementation of Hash? Why is it implemented this way?
What if the size of the HashMap exceeds the capacity defined by the Load factor?
HashMap is implemented slightly differently in JDK 7 and JDK8. Record them separately.
Before diving into HahsMap, you need to understand some of the concepts
-
InitialCapacity: indicates the initialCapacity. Refers to the capacity of the HashMap collection when it is initialized. Can be specified in the constructor; If not specified, the total capacity defaults to 16. Note that the initial capacity has to be a power of two. (In 1.7, when the number of KV to be stored in HashMap is known, setting a reasonable initial capacity can effectively improve performance)
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 Copy the code
-
Size: The number of key-value pairs already stored in the current HashMap, that is, hashmap.size ().
-
LoadFactor: loadFactor. The load factor is when the HashMap (current capacity/total capacity) reaches a certain value, the HashMap expands. The load factor can also be specified in the constructor, and the default value is 0.75. For example, if you have a HashMap with an initial capacity of 16, the threshold for expansion is 0.75 * 16 = 12. That is, before you want to store the 13th value, HashMap does the expansion first.
-
Threshold: indicates the capacity expansion threshold. That is, capacity expansion threshold = total capacity of HashMap x load factor. Capacity expansion is performed when the current HashMap capacity is greater than or equal to the capacity expansion threshold. The capacity expanded is twice the total capacity of the current HashMap. For example, if the current total capacity of the HashMap is 16, it will be 32 after expansion.
-
Table: Indicates an Entry array. We all know that the internal storage of keys/values in a HashMap is implemented through the Entry medium. And a table is an array of entries.
JDK1.7 implementation
In JDK1.7, a HashMap consists of an array and a linked list. The array is the body of the HashMap, and the linked list is mainly used to resolve hash collisions. If the location of the array does not contain a linked list (the next of the current entry points to NULL), the search, add, and other operations are quick, requiring only one address. If the array to be located contains a linked list, the time complexity for the add operation is still O(1) because the latest Entry is inserted into the head of the list, requiring a simple change of the reference chain. For the lookup operation, it is necessary to traverse the list and then compare the search through the Equals method of the key object.
The so-called “zipper method” is a combination of linked lists and arrays. That is, create an array of linked lists, where each cell is a linked list. If a hash conflict is encountered, add the value of the conflict to the linked list.
The source code parsing
A constructor
The Alibaba Java Development Manual recommends specifying the initial value of a collection when initializing the collection. (note: HashMap using HashMap (int initialCapacity) initialization) recommended reason: www.zhihu.com/question/31…
// The default constructor uses the default initial capacity and load factor
// DEFAULT_INITIAL_CAPACITY = 16, DEFAULT_LOAD_FACTOR = 0.75F
public HashMap(a) {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
// You can specify the initial capacity and use the default load factor
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap(int initialCapacity, float loadFactor) {
// Determine the initial capacity value
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
// Set the load factor
this.loadFactor = loadFactor;
threshold = initialCapacity;
/ / short method
init();
}
public HashMap(Map<? extends K, ? extends V> m) {
this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
inflateTable(threshold);
putAllForCreate(m);
}
Copy the code
The first three constructors of a HashMap all end up calling HashMap(int initialCapacity, float loadFactor). Set the initial capacity and load factor inside it. The final init() method is empty and is implemented mainly for subclasses such as LinkedHashMap.
Put () method
public V put(K key, V value) {
// If the table array is empty, create the array first and set the expansion threshold
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
// If the key is null, call putForNullKey special processing
if (key == null)
return putForNullKey(value);
// Calculate the hash value of key
int hash = hash(key);
// Calculate the index in the array based on the calculated hash value and the current array length
int i = indexFor(hash, table.length);
// Iterate through the entire list under the array index
// If the key is already stored in the HashMap, just replace the corresponding value
for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
Object k;
// Check whether the hash value is the same. If so, check whether the key is the same. Different objects may have the same hash value
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
// If the key has not been stored before, the addEntry method is entered
addEntry(hash, key, value, i);
return null;
}
void addEntry(int hash, K key, V value, int bucketIndex) {
// Capacity expansion is performed when the current capacity is greater than or equal to the capacity expansion threshold
if ((size >= threshold) && (null! = table[bucketIndex])) {// Double the original capacity
resize(2 * table.length);
// recalculate the hash value
hash = (null! = key) ? hash(key) :0;
// Retrieve the index in the new array
bucketIndex = indexFor(hash, table.length);
}
// Create a node
createEntry(hash, key, value, bucketIndex);
}
// create a new array, copy all the data, and assign a reference to the new array to the table
void resize(int newCapacity) {
Entry[] oldTable = table; // Reference the Entry array before expansion
int oldCapacity = oldTable.length;
if (oldCapacity == MAXIMUM_CAPACITY) { // The size of the array before the expansion has reached the maximum (2^30)
threshold = Integer.MAX_VALUE; // Change the threshold to the maximum value of int (2^31-1) so that the capacity will not be expanded later
return;
}
// Create a new entry array
Entry[] newTable = new Entry[newCapacity];
// Copy the data from the old entry array to the new entry array
transfer(newTable, initHashSeedAsNeeded(newCapacity));
// Assign a reference to the new array to table
table = newTable;
// Calculate the new capacity expansion threshold
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
void transfer(Entry[] newTable) {
Entry[] src = table; // SRC references the old Entry array
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) { // Iterate over the old Entry array
Entry<K,V> e = src[j]; // Get each element of the old Entry array
if(e ! =null) {
src[j] = null;// Release the object reference of the old Entry array (after the for loop, the old Entry array no longer references any objects)
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity); / /!!!!! Recalculates the position of each element in the array
e.next = newTable[i]; / / tag [1]
newTable[i] = e; // Put the element on the array
e = next; // Access the element in the next Entry chain
} while(e ! =null); }}}void createEntry(int hash, K key, V value, int bucketIndex) {
// Retrieve the Entry with bucketIndex subscript in the table
Entry<K,V> e = table[bucketIndex];
// Create a new Entry using key and value
// And the Entry stored at table[bucketIndex] is the next of the new Entry
// Place the newly created Entry in table[bucketIndex]
table[bucketIndex] = new Entry<>(hash, key, value, e);
// The capacity of the current HashMap is increased by 1
size++;
}
Copy the code
The last createEntry() method shows that when a hash conflict occurs, the zip method is used to resolve the hash conflict, and the new element is inserted into the head of the unilateral table.
The get () method
public V get(Object key) {
// If the key is empty, the getForNullKey method is called for special handling
if (key == null)
return getForNullKey();
// Get the entry corresponding to the key
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}
// Find the array index corresponding to the key, and then traverse the list to find it
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
// Calculate the hash value of key
int hash = (key == null)?0 : hash(key);
// Get the index of the array and iterate through the list to see if there are any entries with the same key
for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
e = e.next) {
Object k;
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
}
// If not, return null
return null;
}
Copy the code
JDK1.8 implementation
In JDK 1.7, if there are too many hash collisions, the zipper is too long, and in extreme cases all the values fall into the same bucket, this degrades to a linked list. Searching by key value requires traversing the linked list, which is inefficient. JDK1.8 has a major change in resolving hash conflicts, converting lists to red-black trees when their length is greater than the threshold (8 by default) to reduce search time.
TreeMap, TreeSet, and HashMap after JDK1.8 all use red-black trees at their underlying level. Red-black trees are designed to solve the problem of binary search trees, which degenerate into a linear structure in some cases.
The source code parsing
A constructor
The JDK8 constructor hasn’t changed much
public HashMap(a) {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
Copy the code
Determine the hash bucket array index position (implementation of hash function)
// Method 1:
static final int hash(Object key) { / / jdk1.8 & jdk1.7
int h;
// h = key.hashcode () takes the hashCode value for the first step
// h ^ (h >>> 16) participates in the second step
return (key == null)?0 : (h = key.hashCode()) ^ (h >>> 16);
}
// Method 2:
static int indexFor(int h, int length) { Put p = TAB [I = (n-1) &hash];
return h & (length-1); // The third step is the modulo operation
}
Copy the code
The hash map locates the index position of the array, which directly determines the discrete performance of the hash method. Essentially, a Hash algorithm consists of three steps: the hashCode of the key, the high order operation, and the modulus operation.
Why is that?
Why is the length of a HashMap a power of 2?
The purpose of course is to reduce hash collisions, so that the table data distribution is more uniform.
-
In the HashMap, the size of the bucket array length is always a power of 2. In this case, h & (table.length-1) is equivalent to modulo h%length. But modulus is not as efficient as bit operation, so this is an optimization. If h = 185, table.length-1 = 15(0x1111), the hash is only valid at low 4bits, so it is easy to collide.
-
The hash in the figure is generated by the hashCode of the key. When calculating the remainder, since n is relatively small, only the lower 4 bits of hash are involved in the calculation, and the calculation of the higher bits can be considered invalid. As a result, the calculation result is only related to the low level information, and the high level data does not play a role. To deal with this flaw, we can xOR the upper four bits of the hash in the figure above with the lower four bits, namely hash ^ (hash >>> 4). In this way, the high level data and the low level data are xor, so as to increase the randomness of the low level information and make the high level data participate in the calculation in a disguised way. The calculation process is as follows:
In Java, the hashCode method produces a hash of type int, 32 bits wide. The first 16 bits are high and the last 16 bits are low, so move it 16 bits to the right, hash ^ (hash >>> 16). This also increases the hash’s complexity, which in turn affects the hash’s distribution.
Why is the length of a HashMap a power of 2?
In order to make HashMap access efficient and minimize collisions, that is, to distribute data evenly, the Hash value ranges from -2147483648 to 2147483647, with a total of 4 billion mapping space. As long as the Hash function mapping is relatively uniform and loose, collisions are difficult to occur in general applications. But one problem is that 4 billion of array memory doesn’t fit. So you can’t use this hash value directly. Before we use it, we need to modulo the length of the array to get the remainder before we use it to store the location, that is, the corresponding array subscript. The array subscript is computed as (n-1)&hash, where n represents the length of the array.
How should this algorithm be designed?
The first thing we might think of is doing % mod. But here’s the point.
In the mod operation, if the divisor is a power of 2, it is equivalent to the and operation of the divisor minus one, that is, hash%length=hash&(length-1), but the premise is that length is 2 to the n, and the operation of & is more efficient than %. This explains why the length of a HashMap is a power of two.
The get () method
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// Locate the bucket where the key-value pair resides
if((tab = table) ! =null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) ! =null) {
// Direct hit
if (first.hash == hash && // always check first node((k = first.key) == key || (key ! =null && key.equals(k))))
return first;
/ / not hit
if((e = first.next) ! =null) {
// If first is of type TreeNode, the black mangrove lookup method is called
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
// Look it up in the list
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
} while((e = e.next) ! =null); }}return null;
}
Copy the code
Put () method
public V put(K key, V value) {
// Hash key hashCode()
return putVal(hash(key), key, value, false.true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
// if TAB is empty, it will be created
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// Compute index and handle null
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
// The node key exists, overwriting value directly
if(p.hash == hash && ((k = p.key) == key || (key ! =null && key.equals(k))))
e = p;
// Determine the chain to be a red-black tree
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
// The chain is a linked list
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// List length greater than 8 is converted to red black tree for processing
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
// Key already exists overwriting value directly
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
break; p = e; }}if(e ! =null) { // existing mapping for key
V oldValue = e.value;
if(! onlyIfAbsent || oldValue ==null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
// Expand the capacity when the capacity exceeds the maximum
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
Copy the code
How does a HashMap put function flow?
Check whether the key-value pair array table[I] is empty or null; otherwise, perform resize() for expansion;
Table [I]==null; table[I]==null; table[I]==null;
Check whether the first element of the table[I] is the same as the first element of the key. If the first element of the table[I] is the same as the first element of the key.
Check whether table[I] is a treeNode, i.e. whether table[I] is a red-black tree. If table[I] is a red-black tree, insert key pairs directly into the tree; otherwise, change to ⑤.
(5). Traverses table[I] to determine whether the length of the linked list is greater than 8. If the length is greater than 8, the linked list is converted into a red-black tree, and the insertion operation is performed in the red-black tree; otherwise, the insertion operation of the linked list is carried out; If the key already exists in the traversal process, the value can be overwritten directly.
⑥. After the insert is successful, check whether the actual number of key value pairs size exceeds the maximum capacity threshold. If so, expand the capacity.
The resize () expansion
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null)?0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
// If it exceeds the maximum value, it will no longer be expanded
if (oldCap >= MAXIMUM_CAPACITY) {
// Change the threshold to the maximum value of int (2^31-1) so that the capacity will not be expanded later
threshold = Integer.MAX_VALUE;
return oldTab;
}
// If the value is not higher than the maximum value, the value is doubled
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// Calculate the new resize upper limit
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
// Start copying to the new array
@SuppressWarnings({"rawtypes"."unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
// Move each bucket to the new buckets
if(oldTab ! =null) {
// Loop over the old table array
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if((e = oldTab[j]) ! =null) {
oldTab[j] = null;
// If the bucket has only one element, copy it directly
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// If it is a red-black tree, then split the red-black tree
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
// Walk through the list and divide the original list into LO and HI
// Where lo stands for the old bucket position, hi stands for the new bucket position
else { // The preserve Order list optimizes the code block for rehash
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
/ / the original indexes
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
// Old index +oldCap
else {
if (hiTail == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; }}while((e = next) ! =null);
// Put the original index in the bucket
if(loTail ! =null) {
loTail.next = null;
newTab[j] = loHead;
}
// Add oldCap to bucket
if(hiTail ! =null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy the code
How is the HashMap expansion operation implemented?
- In JDK1.8, the resize method is called when the key pair in the HashMap exceeds the threshold or during initialization.
- Every time it expands, it’s twice as big;
- After the extension, the Node object is either in the same position or moved to twice the original offset.
In putVal(), we see that the resize() method is used twice in this function. The resize() method means that it will be expanded on the first initialization, or when the actual size of the array is greater than its expansion threshold (0.75 * 16 = 12 for the first time). This is an optimization of JDK1.8. In 1.7, after expansion, it is necessary to re-calculate the Hash value and distribute the bucket according to the Hash value. In 1.8, however, If (e.hash & oldCap) is 0 in the same bucket location, the element will either stay at the original location or move to the original location + the increased array size after the hash assignment
List the tree
When the list length in the bucket exceeds TREEIFY_THRESHOLD (default: 8), the treeifyBin method is called for tree-ization. However, this is not necessarily tree-like, because the HashMap capacity is also determined in the treeifyBin method to be greater than or equal to 64. If the capacity is greater than or equal to 64, tree the capacity. Otherwise, expand the capacity first.
Why do we have to tree to see if the total capacity is greater than 64?
When the bucket array capacity is small, the collision rate of key-value on node hash may be high, resulting in long list length, resulting in low query efficiency. At this point, we have two options, one is to expand, to make the hash collision rate lower. The other is treification to improve query efficiency.
If we use scaling, then all we need to do is make a copy of the linked list data. If we were to tree, we would need to turn the list into a red-black tree. Up to this point, it doesn’t look like there’s much difference, but we let the scene continue. As we insert more and more data, we see more and more linked lists that need to be converted into trees.
At a certain capacity, we need to expand. At this point, we had a lot of tree-like red-black trees, and when we expanded, we had to split a lot of red-black trees into linked lists, which was a pretty big cost. If we expand when the capacity is small, the fewer lists we need to tree, the lower the cost of expansion.
Let’s see how list tree works:
final void treeifyBin(Node<K,V>[] tab, int hash) {
int n, index; Node<K,V> e;
// 1. If the capacity is smaller than MIN_TREEIFY_CAPACITY, capacity expansion is preferred
if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
resize();
// 2. If the bucket is not empty, tree it
else if ((e = tab[index = (n - 1) & hash]) ! =null) {
TreeNode<K,V> hd = null, tl = null;
// 2.1 First convert the linked list to TreeNode's bidirectional list
do {
TreeNode<K,V> p = replacementTreeNode(e, null);
if (tl == null)
hd = p;
else {
p.prev = tl;
tl.next = p;
}
tl = p;
} while((e = e.next) ! =null);
// 2.2 Tree the bidirectional linked list represented by TreeNode
if((tab[index] = hd) ! =null) hd.treeify(tab); }}Copy the code
We can see that the whole idea of tree lists is also a little bit clearer. The list is first turned into a bidirectional list represented by TreeNode, and then the treeify() method is called. So let’s move on to the implementation of the Treeify () method.
final void treeify(Node<K,V>[] tab) {
TreeNode<K,V> root = null;
// 1. Traverse the bidirectional TreeNode linked list and insert TreeNode nodes one by one
for (TreeNode<K,V> x = this, next; x ! =null; x = next) {
next = (TreeNode<K,V>)x.next;
x.left = x.right = null;
// 2. If the root node is empty, set node X directly as the root node
if (root == null) {
x.parent = null;
x.red = false;
root = x;
}
else {
K k = x.key;
inth = x.hash; Class<? > kc =null;
// 3. If the root node is not empty, compare and insert it in the appropriate place
for (TreeNode<K,V> p = root;;) {
int dir, ph;
K pk = p.key;
// 4. Calculate the dir value, -1 means to find the insertion point from the left subtree, 1 means to find the insertion point from the right subtree
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0)
dir = tieBreakOrder(k, pk);
TreeNode<K,V> xp = p;
// 5. If a point p is found, this point is a leaf node
// Then this is the insertion position
if ((p = (dir <= 0)? p.left : p.right) ==null) {
x.parent = xp;
if (dir <= 0)
xp.left = x;
else
xp.right = x;
// Perform dynamic balancing after insertion
root = balanceInsertion(root, x);
break; }}}}// 6. Move the root node to the list head
moveRootToFront(tab, root);
}
Copy the code
As you can see from the code above, the Treeify () method simply treify the bidirectional TreeNode list further into a red-black tree. The general steps are as follows:
- The TreeNode bidirectional linked list is traversed and the TreeNode nodes are inserted one by one
- If the root node is empty, then the red-black tree is now empty and the node is directly used as the root node. Otherwise, you need to find an appropriate position to insert the TreeNode.
- Constantly look for the most appropriate point by comparing the location with the root node. If the final leaf node of the node is empty, then the node P is the parent of the inserted node. Next, the parent reference of the X node points to the XP node, and the left or right child of the XP node points to the X node.
- Then I called the balanceInsertion to do a dynamic balancing.
- Finally, the moveRootToFront method is called to move the root node to the head of the list.
Insertion insertion () for red black tree dynamic balancing. Finally, let’s move on to the moveRootToFront method.
static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
int n;
if(root ! =null&& tab ! =null && (n = tab.length) > 0) {
int index = (n - 1) & root.hash;
TreeNode<K,V> first = (TreeNode<K,V>)tab[index];
// If the root node inserted into the red-black tree is not the first element in the list
// Insert the root node before the first node
if(root ! = first) { Node<K,V> rn; tab[index] = root; TreeNode<K,V> rp = root.prev;// The following two if statements do the same thing: take the root node out
// make the predecessor of the root node point to the successor node
// make the successor of the root node point to its predecessor
if((rn = root.next) ! =null)
((TreeNode<K,V>)rn).prev = rp;
if(rp ! =null)
rp.next = rn;
// Insert the root node directly before the first node
if(first ! =null)
first.prev = root;
root.next = first;
root.prev = null;
}
assert checkInvariants(root); }}Copy the code
Red black tree split
After capacity expansion, common nodes need to be remapped, and red-black tree nodes are no exception. As a general idea, we can first turn a red-black tree into a linked list and then remap the linked list. However, because the TreeNode stores the insertion order of the elements when the red-black tree is inserted, it can be directly restored to the linked list according to the insertion order. In this way, the hash mapping is avoided after the red-black tree is transformed into a linked list, which improves efficiency virtually.
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
TreeNode<K,V> b = this;
// Relink into lo and hi lists, preserving order
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null;
int lc = 0, hc = 0;
// 1. Think of a red-black tree as a bidirectional list of Treenodes
// Add them to the linked list starting with loHead and hiHead
for(TreeNode<K,V> e = b, next; e ! =null; e = next) {
next = (TreeNode<K,V>)e.next;
e.next = null;
// 1.1. The node is added to the loHead list in the same position after expansion
if ((e.hash & bit) == 0) {
if ((e.prev = loTail) == null)
loHead = e;
else
loTail.next = e;
loTail = e;
++lc;
}
// 1.2 After expansion, the position is changed and added to the hiHead linked list
else {
if ((e.prev = hiTail) == null)
hiHead = e;
elsehiTail.next = e; hiTail = e; ++hc; }}// 2. Tree or list loHead and hiHead
if(loHead ! =null) {
If the length of the list is less than the threshold, then list, otherwise tree
if (lc <= UNTREEIFY_THRESHOLD)
tab[index] = loHead.untreeify(map);
else {
tab[index] = loHead;
if(hiHead ! =null) // (else is already treeified)loHead.treeify(tab); }}if(hiHead ! =null) {
if (hc <= UNTREEIFY_THRESHOLD)
tab[index + bit] = hiHead.untreeify(map);
else {
tab[index + bit] = hiHead;
if(loHead ! =null) hiHead.treeify(tab); }}}Copy the code
We know from the above code that the expansion of a red-black tree is the same as the transfer of a linked list. The difference is that after it is converted to a hiHead or loHead, it will choose whether to split into a linked list or maintain the structure of a red-black tree according to the length of the list.
Red black tree chain
When we talked about red-black tree splitting, we said that if the red-black tree structure is below the threshold when it expands, it will be converted to a linked list. Its implementation code is as follows:
final Node<K,V> untreeify(HashMap<K,V> map) {
Node<K,V> hd = null, tl = null;
for (Node<K,V> q = this; q ! =null; q = q.next) {
Node<K,V> p = map.replacementNode(q, null);
if (tl == null)
hd = p;
else
tl.next = p;
tl = p;
}
return hd;
}
Copy the code
Because the red-black tree contains the order in which the elements are inserted, when we split the red-black tree into two linked lists, hiHead and loHead, the order remains the same. So all we need to do is loop through and replace TreeNode with Node.
HashMap: JDK1.7 VS JDK1.8
JDK1.8 mainly addresses or optimizes the following issues:
- Resize Capacity expansion optimization
- Red black tree is introduced to avoid the query efficiency caused by the excessively long single list
- It solves the problem of multi-thread dead-loop, but it is still non-thread-safe, which may cause data loss problems when multi-threading
different | JDK 1.7 | JDK 1.8 |
---|---|---|
Storage structure | Array + linked list | Array + linked list + red-black tree |
Initialization mode | Single function: inflateTable() | Directly integrated into the expansion function resize() |
Hash value calculation method | Perturbation = 9 perturbations = 4 bit operations + 5 XOR operations | Perturbation = 2 perturbations = 1 bit operation + 1 xOR operation |
Rules for storing data | When there are no conflicts, store arrays. In case of conflicts, store the linked list | When there are no conflicts, store arrays. Collision & List length < 8: store single linked list; Collision & list length > 8: tree and red black tree |
Data insertion mode | Head insertion method (first move the data from the original position to the last 1 bit, and then insert the data to the position) | Tail insertion (directly into the end of the list/red-black tree) |
How to calculate the storage location after capacity expansion | Do all the calculations in the original way (i.e. HashCode ->> Perturbation function ->> (h&Length-1)) | Calculate the capacity based on the rule after expansion (after expansion = original location or original location + old capacity) |
Hashtable
Hashtable and HashMap are hash tables and hash tables implemented using the “zip method”. Save data is an Entity object like a HashMap in JDK7, except that almost all public methods in Hashtable are synchronized, and some methods are implemented internally with synchronized blocks. Efficiency is bound to decrease. And the put() method does not allow null values.
Difference between HashMap and Hashtable
-
Thread-safe: HashMap is not thread-safe, HashTable is thread-safe; Methods inside a HashTable are basically synchronized modified. (Use ConcurrentHashMap if you want to be thread-safe!) ;
-
Efficiency: Because of thread-safety issues, HashMap is a little more efficient than HashTable. Also, HashTable is largely obsolete, so don’t use it in your code;
-
Support for Null keys and Null values: In a HashMap, Null can be used as a key. There is only one such key, and there can be one or more keys corresponding to the value Null. A NullPointerException is thrown whenever a key value in a HashTable contains a NULL.
-
The difference between the initial capacity and each expansion capacity:
(1) If you do not specify an initial capacity when creating a Hashtable, the default initial size of the Hashtable is 11. After each expansion, the original capacity is 2n+1. The default initialization size of a HashMap is 16. After each expansion, the capacity is doubled.
(2) If you create a Hashtable with an initial capacity, then the Hashtable will use the given size, and the HashMap will expand it to a power of 2. That is, a HashMap always uses a power of 2 as its hash table size, and I’ll explain why later.
-
Underlying data structure: HashMap since JDK1.8 has a major change in resolving hash collisions, converting lists to red-black trees to reduce search time when the list length is greater than the threshold (8 by default). Hashtable has no such mechanism.
-
HashMap iterators are fail-fast iterators, but Hashtable iterators are not Fail-fast iterators. If there are other threads to add/remove elements, HashMap will throw ConcurrentModificationException, but the remove method remove elements of iterator itself will not throw an exception. This is also the difference between Enumeration and Iterator.
ConcurrentHashMap
When a HashMap is multithreaded, when a put exceeds the capacity (as determined by the load factor) it triggers an expansion operation, known as a rehash, which rehashes the contents of the original array into the new expansion array. If the hash value is the same, it may be represented by a linked list in the same array at the same time, resulting in a closed loop, resulting in an infinite loop during get. Therefore, the HashMap is not thread safe. (refer to: www.jianshu.com/p/e2f75c8cc…
Hashtable is thread-safe. It uses the synchronized keyword to lock the entire table, which means that all threads are competing for a lock. In a multi-threaded environment, Hashtable is safe, but undoubtedly inefficient.
JDK1.7 implementation
The reason the Hashtable container is inefficient in a highly competitive concurrent environment is that all threads accessing the Hashtable must compete for the same lock. If there are multiple locks in the container, and each lock is used to lock one portion of the container’s data, then when multiple threads access different segments of the container’s data, There is no lock contention between threads, which is the lock fragmentation technique used by ConcurrentHashMap.
In JDK1.7, the data structure of ConcurrentHashMap consists of an array of segments and multiple hashentries. The Segment array is used to split a large table into smaller tables for locking. Each Segment element stores a HashEntry array + linked list, which is the same data storage structure as a HashMap.
The ConcurrentHashMap class contains two static inner classes, HashEntry and Segment. HashEntry is used to encapsulate key-value pairs of the mapping table. Segment acts as a lock. Each Segment object guards buckets of the entire hash mapping table. Each bucket is a linked list of several HashEntry objects. An instance of ConcurrentHashMap contains an array of Segment objects. Each Segment guards an element in a HashEntry array. When modifying the HashEntry array, you must first acquire its Segment lock.
Segment class
The Segment class inherits from the ReentrantLock class, enabling the Segment object to act as a ReentrantLock. A Segment is a subhash table, and a HashEntry array is maintained in the Segment. In a concurrent environment, there is no need to consider lock contention for different segments.
As you can see from the source code, the Segment inner class is very similar to the HashMap we saw above. There are load factors, thresholds, all sorts of things.
static final class Segment<K.V> extends ReentrantLock implements Serializable {
static final int MAX_SCAN_RETRIES =
Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;
transient volatile HashEntry<K,V>[] table;
transient int count;
transient int modCount; // Record the number of changes
transient int threshold;
final float loadFactor;
Segment(float lf, int threshold, HashEntry<K,V>[] tab) {
this.loadFactor = lf;
this.threshold = threshold;
this.table = tab;
}
// the put method will add the lock,
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null :
scanAndLockForPut(key, hash, value);
// ...
}
@SuppressWarnings("unchecked")
private void rehash(HashEntry<K,V> node) {
// ...
}
private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
/ /...
}
private void scanAndLock(Object key, int hash) {
/ /...
}
final V remove(Object key, int hash, Object value) {
/ /...
}
final boolean replace(K key, int hash, V oldValue, V newValue) {
/ /...
}
final V replace(K key, int hash, V value) {
/ /...
}
final void clear(a) {
/ /...}}Copy the code
HashEntry class
HashEntry is currently the smallest logical processing unit. A ConcurrentHashMap maintains an array of segments, and a Segment maintains a HashEntry array.
static final class HashEntry<K.V> {
final int hash;
final K key;
volatile V value; // Value is of the volatie type, which is guaranteed to be visible
volatile HashEntry<K,V> next;
/ /...
}
Copy the code
ConcurrentHashMap class
By default, each ConcurrentHashMap class creates 16 concurrent segments, each containing multiple Hash tables, and each Hash chain consists of HashEntry nodes.
public class ConcurrentHashMap<K.V> extends AbstractMap<K.V>
implements ConcurrentMap<K.V>, Serializable {
// The default capacity is 16, that is, 16 buckets by default
static final int DEFAULT_INITIAL_CAPACITY = 16;
static final float DEFAULT_LOAD_FACTOR = 0.75 f;
// The default concurrency level is 16. This value represents the estimate of the current update thread
static final int DEFAULT_CONCURRENCY_LEVEL = 16;
static final int MAXIMUM_CAPACITY = 1 << 30;
static final int MIN_SEGMENT_TABLE_CAPACITY = 2;
static final int MAX_SEGMENTS = 1 << 16; // slightly conservative
static final int RETRIES_BEFORE_LOCK = 2;
final int segmentMask; // Segment mask is used to locate segments
final int segmentShift;
final Segment<K,V>[] segments; // The trunk is the segmented lock array
/ / the constructor
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if(! (loadFactor >0) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
//MAX_SEGMENTS = 1<<16=65536, that is, the maximum number of concurrent connections is 65536
if (concurrencyLevel > MAX_SEGMENTS)
concurrencyLevel = MAX_SEGMENTS;
// Sshif = ssize,sshift=4; ssize=32,sshif=5
int sshift = 0;
// ssize is the length of the segments array, calculated based on concurrentLevel
int ssize = 1;
while (ssize < concurrencyLevel) {
++sshift;
ssize <<= 1;
}
this.segmentShift = 32 - sshift;
this.segmentMask = ssize - 1;
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
int c = initialCapacity / ssize;
if (c * ssize < initialCapacity)
++c;
int cap = MIN_SEGMENT_TABLE_CAPACITY;
while (cap < c)
cap <<= 1;
// Create a segments array and initialize the first Segment, delaying initialization for the rest of the Segment
Segment<K,V> s0 =
new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
(HashEntry<K,V>[])new HashEntry[cap]);
Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
this.segments = ss; }}Copy the code
Put () method
- ** Locate the segment and ensure that the segment is initialized **
- Call the Segment put method.
public V put(K key, V value) {
Segment<K,V> s;
//concurrentHashMap Key /value cannot be empty
if (value == null)
throw new NullPointerException();
// The hash function rehashes the key's hashCode to avoid bad, irrational hashcodes and ensure a uniform hash
int hash = hash(key);
// The hash value returned is unsigned right shift segmentShift bit and segment mask bit arithmetic to locate the segment
int j = (hash >>> segmentShift) & segmentMask;
if ((s = (Segment<K,V>)UNSAFE.getObject // nonvolatile; recheck
(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
s = ensureSegment(j);
return s.put(key, hash, value, false);
}
Copy the code
The get () method
The get method does not need to be locked, because the shared variables involved are volatile, which guarantees memory visibility and therefore does not read stale data
public V get(Object key) {
Segment<K,V> s; // manually integrate access methods to reduce overhead
HashEntry<K,V>[] tab;
int h = hash(key);
long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
if((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) ! =null&& (tab = s.table) ! =null) {
for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
(tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE); e ! =null; e = e.next) {
K k;
if ((k = e.key) == key || (e.hash == h && key.equals(k)))
returne.value; }}return null;
}
Copy the code
JDK1.8 implementation
ConcurrentHashMap has been greatly changed in JDK8, increasing the code volume from 1000 + lines to 6000 lines alone! 1.8 Abandons the concept of Segment and adopts CAS + synchronized to ensure the security of concurrency.
As you can see, it is very similar to the data structure of HashMap 1.8. The underlying data structure is changed to the data form of array + linked list + red-black tree.
Some of the same things as HashMap1.8
- The underlying data structure is consistent
- HashMap initialization is done when the element is first put, not init
- The underlying array length of a HashMap is always a whole power of 2
- The default tree threshold is 8 and the linked list threshold is 6 (below this threshold, red-black trees become linked lists)
- The hash algorithm is similar, but with an extra step
& HASH_BITS
, this step is to eliminate the negative symbol of the highest bit. The negative of the hash has special meaning in ConcurrentHashMapExpansion or tree node
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash
static final int spread(int h) {
return (h ^ (h >>> 16)) & HASH_BITS;
}
Copy the code
Some key attributes
private static final int MAXIMUM_CAPACITY = 1 << 30; // The maximum size of the array is the same as that of HashMap
private static final int DEFAULT_CAPACITY = 16;// Array default size
static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; // The maximum possible value of an array, which needs to be associated with the toArray () method
private static final int DEFAULT_CONCURRENCY_LEVEL = 16; // The default thread concurrency, similar to semaphores
private static final float LOAD_FACTOR = 0.75 f;// The default map expansion ratio is (n << 1) - (n >>> 1), which is more efficient
static final int TREEIFY_THRESHOLD = 8; // List to tree threshold, greater than 8
static final int UNTREEIFY_THRESHOLD = 6; <=UNTREEIFY_THRESHOLD untreeify(LO); // The tree list threshold is less than or equal to 6 (for tranfer, lc, hc=0 and two counters ++ record the number of original bin and new binTreeNode respectively, <=UNTREEIFY_THRESHOLD untreeify(LO)). Tree list is only possible with tranfer expansion
static final int MIN_TREEIFY_CAPACITY = 64;
private static final int MIN_TRANSFER_STRIDE = 16;// The minimum array group size for expansion transfer
private static int RESIZE_STAMP_BITS = 16;// This class does not provide a modified method to generate a timestamp function for position n
private static final int MAX_RESIZERS = (1< < (32 - RESIZE_STAMP_BITS)) - 1; // 2^15-1, help resize Maximum number of threads
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS; // If 32-16=16, record the offset of size size in sizeCtl
static final int MOVED = -1; // Hash for Forwarding nodes (hash value of forwarding nodes), hash bit
static final int TREEBIN = -2; // Hash for roots of trees
static final int RESERVED = -3; // The hash of ReservationNode
static final int HASH_BITS = 0x7fffffff; // Eliminate negative hashing using bits and computations when calculating hashes
static final int NCPU = Runtime.getRuntime().availableProcessors(); // Number of available processors
/* ---------------- Fields -------------- */
transient volatile Node<K,V>[] table; // An array of nodes, used as a data container for ConcurrentHashMap, is lazily loaded and not initialized until the first insertion. The array size is always a power of 2.
private transient volatile Node<K,V>[] nextTable; // The value is null during capacity expansion and non-null only during capacity expansion
/** * actually holds the number of elements in the hashMap updated using the CAS lock, but it does not return the number of elements in the current hashmap */
private transient volatile long baseCount;
/** * This attribute controls the size of the table array, depending on whether it is initialized or expanding. * When the value is negative, -1 indicates that it is initializing, and -n indicates that n-1 threads are expanding. * When the value is positive: If the current array is null, it indicates the size of the array to be created during table initialization. If it has been initialized, it indicates that the available capacity of the current data container (table array) can also be regarded as the critical value (if the number of inserted nodes exceeds the critical value, the capacity needs to be expanded). Specifically, it refers to the length of the array N times the loading factor loadFactor. When the value is 0, the array length is the default initial value. * /
private transient volatile int sizeCtl;
Copy the code
Put () method
- First check whether the key and value are empty, if empty throw exception!
spread()
Method to get the hash and reduce hash collisions- Check whether the table array is initialized or not
initTable()
Method to initialize - Determines whether the new value can be inserted directly into the table array
- Determine whether the current capacity is being expanded.
MOVED
If the value is -1, ConcurrentHashMap is expanding capacity. If the capacity is expanding, help the capacity expansion - When table[I] is the head node of the linked list, new values are inserted into the linked list and locked by synchronized (f) to achieve thread safety.
- If a node in the linked list is found with the same key as the key pair to be inserted, it is overwritten
- If not, append the key-value pair to the end of the list
- Insert new value/overwrite old value in red-black tree when table[I] is root node
- The value is adjusted according to the current number of nodes. If the number of nodes is greater than or equal to 8, the value is called
treeifyBin
Methods the tabel [I]The ith hash bucket
Zip to red black tree) - The current capacity is checked and expanded if it exceeds the threshold (actual size x load factor)
final V putVal(K key, V value, boolean onlyIfAbsent) {
// Neither key nor value can be null
if (key == null || value == null) throw new NullPointerException();
// Compute the hash value based on the key
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
// Determine whether initialization is required
if (tab == null || (n = tab.length) == 0)
tab = initTable();
// f is the Node located by the current key. If f is empty, data can be written to the current position. If CAS is used to attempt to write data, the spin is guaranteed to succeed
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null.new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
// If the current location of hashCode == MOVED == -1, the capacity needs to be expanded
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
// Synchronized locks are used to write data
else {
// Insert a linked list and a red-black tree
V oldVal = null;
// Use synchronous built-in lock for concurrency control
synchronized (f) {
if (tabAt(tab, i) == f) {
// If fh=f.hash >=0, the list is currently linked, and a new key-value pair is inserted into the list
if (fh >= 0) {
binCount = 1;
If a node is found, change the value, otherwise add the node directly to the end of the list
for (Node<K,V> e = f;; ++binCount) {
K ek;
if(e.hash == hash && ((ek = e.key) == key || (ek ! =null && key.equals(ek)))) {
oldVal = e.val;
if(! onlyIfAbsent) e.val = value;break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break; }}}// Insert the new key-value pair into the red-black tree
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if((p = ((TreeBin<K,V>)f).putTreeVal(hash, key, value)) ! =null) {
oldVal = p.val;
if(! onlyIfAbsent) p.val = value; }}}}// Convert the key-value pair to a red-black tree based on the actual size
if(binCount ! =0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if(oldVal ! =null)
return oldVal;
break; }}}// Check the current capacity. If it exceeds the threshold (actual size x load factor), you need to expand the capacity
addCount(1L, binCount);
return null;
}
Copy the code
We can see that the implementation in JDK8 is the same idea of locking a Node, but not a Segment in JDK7. The operations before locking a Node are unlocked and thread-safe, based on the atomic operations mentioned earlier.
The get () method
The get method does not need to be locked, because the shared variables involved are volatile, which guarantees memory visibility and therefore does not read stale data
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());
// Check whether the array is empty
if((tab = table) ! =null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) ! =null) {
// Check whether the first element of node is to be found
if ((eh = e.hash) == h) {
if((ek = e.key) == key || (ek ! =null && key.equals(ek)))
return e.val;
}
If // // hash is less than 0, a special node (TreeBin or ForwardingNode) invokes find
else if (eh < 0)
return(p = e.find(h, key)) ! =null ? p.val : null;
// If it is not the case above, it is the linked list
while((e = e.next) ! =null) {
if(e.hash == h && ((ek = e.key) == key || (ek ! =null && key.equals(ek))))
returne.val; }}return null;
}
Copy the code
The difference between Hashtable and ConcurrentHashMap
The main difference between ConcurrentHashMap and Hashtable is in the way thread-safe is implemented.
- Data structure: ConcurrentHashMap in JDK1.7 uses a partialized array + linked list implementation. The data structure in JDK1.8 is similar to that in HashMap1.8: array + linked list/red-black binary tree. The underlying data structure of Hashtable and HashMap before JDK1.8 is similar to that of array + linked list. Array is the body of HashMap, while linked list is mainly used to solve hash conflicts.
- Thread-safe approach (important) :
- In JDK1.7, ConcurrentHashMap segments the entire bucket array. Each lock locks only a portion of the container’s data. When multiple threads access different data segments in the container, there is no lock contention, which improves the concurrent access rate. (The default allocation of 16 segments is 16 times more efficient than Hashtable.) In JDK1.8, the concept of Segment has been abandoned, but directly using Node array + linked list/red-black tree data structure to achieve concurrency control using synchronized and CAS to operate. The whole thing looks like an optimized, thread-safe HashMap. Although you can still see the Segment data structure in JDK1.8, the attributes have been simplified to make it compatible with older versions.
- Hashtable(the same lock) : Using synchronized for thread safety is extremely inefficient. When one thread accesses the synchronous method, other threads also access the synchronous method, and may enter the blocking or polling state. For example, put is used to add elements. Another thread cannot use PUT to add elements, nor can it use GET.
The difference between Java fail-Fast and Fail-safe
Fail — fast
In use iterators iterate over a collection object, if the traversal process to modify the contents of the collection objects (add, delete, modify), will throw ConcurrentModificationException.
How it works: Iterators directly access the contents of a collection during traversal and use a modCount variable during traversal. If the contents of the collection change during traversal, the value of modCount is changed. Whenever the iterator iterates over the next element using hashNext()/next(), it checks whether the modCount variable is the expectedmodCount value and returns the traversal if it is; Otherwise, an exception is thrown and the traversal is terminated.
Note: The exception is thrown if modCount is detected! =expectedmodCount This condition. If the set changes when the modCount value is just set to the expectedmodCount value, the exception will not be thrown. Therefore, concurrent operations cannot be programmed depending on whether or not this exception is thrown; this exception is only recommended for detecting concurrent modification bugs.
Scenario: Collection classes in the java.util package fail fast and cannot be modified concurrently (iteratively) in multiple threads.
Fail-safe
The collection container using security failure mechanism is not directly accessed on the collection content in traversal, but copies the original collection content first and traverses the copied collection.
Principle: Since the copy of the original collection is iterated during iteration, the changes made to the original collection during iteration cannot be detected by the iterator, so Concurrent Modification Modification is not triggered.
Disadvantages: Copy-based has the advantage of avoiding Concurrent Modification Exceptions, but again, the iterator cannot access the modified content, i.e., iterators iterate over the copy of the collection at the beginning of the iteration, and the iterator is not aware of the changes that occurred during the iteration of the original collection.
Scenario: Containers under the java.util.concurrent package are security failures and can be used and modified concurrently in multiple threads.
Fast failures and safe failures are for iterators.
Fail fast: When iterating over a collection, ConcurrentModification is thrown if another thread is modifying the collection. Fail fast under java.util.
Security failure: a copy is made at layer 2 of the collection during iteration, so modifying elements at the upper level of the collection does not affect the lower level. Both are security failures under java.util.concurrent
How to avoidfail-fast ?
- If you want to remove during a single-threaded walk, you can call the remove method of the iterator ListIterator instead of the remove method of the collection class. Look at the source code for the remove method of the iterator in ArrayList. This method cannot specify element deletion, but only remove the current iterator.
public void remove(a) {
if (lastRet < 0)
throw new IllegalStateException();
checkForComodification();
try {
SubList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = ArrayList.this.modCount; //
} catch (IndexOutOfBoundsException ex) {
throw newConcurrentModificationException(); }}Copy the code
- Use and send packets (
java.util.concurrent
) instead of ArrayList and hashMap- CopyOnWriterArrayList ArrayList instead
- Instead of a HashMap ConcurrentHashMap
The difference between Iterator and Enumeration
In Java collections, we typically iterate through collections by “Iterator” or “Enumeration”.
public interface Enumeration<E> {
boolean hasMoreElements(a);
E nextElement(a);
}
Copy the code
public interface Iterator<E> {
boolean hasNext(a);
E next(a);
void remove(a);
}
Copy the code
- Function interfaces are different. Enumeration** has only two function interfaces. With Enumeration, we can only read the data of the collection, not modify it. Iterator has only three function interfaces. The **Iterator can read and delete data from collections.
- Iterator supports fail-fast, but Enumeration does not. Enumeration is an interface added to JDK 1.0. Some of the functions that use it include the Vector and Hashtable classes, which were added in JDK 1.0 and whose purpose is to provide a traversal interface. Enumeration does not support synchronization itself, but it does support synchronization in Vector and Hashtable implementations. Iterator was added in JDK 1.2 to provide a traversal interface for collections like HashMap and ArrayList. Iterator supports fail-fast: When multiple threads operate on the contents of the same collection, fail-fast events can be generated
What is the difference between the Comparable and Comparator interfaces?
There are two ways to sort collection objects or arrays in Java:
-
Object implements the Comparable interface
-
Comparable, under the java.lang package, is an interface with only one internal method compareTo()
public interface Comparable<T> { public int compareTo(T o); } Copy the code
-
Comparable allows the objects of the class implementing it to be compared according to the rules in the compareTo method. This order is called the natural order.
-
Lists or Arrays that implement the Comparable interface can be sorted using the collections.sort () or arrays.sort () methods
-
-
Define the Comparator and implement the Comparator interface
-
Comparator is an interface in the java.util package. JDK 1.8 had only two methods:
public interface Comparator<T> { public int compare(T lhs, T rhs); public boolean equals(Object object); } Copy the code
-
Comparable is equivalent to an internal comparator. A comparator is equivalent to an external comparator
The difference between:
-
Comparator is under the java.util package, while Comparable is under the java.lang package
-
The Comparable interface is implemented inside a class. The Comparable interface is implemented outside a class. The Comparable interface is implemented outside a class. One is external implementation comparison.)
-
Implementing the Comparable interface overrides the compareTo method to implement the comparison in the compareTo method. An object or data that has implemented a Comparable class can be sorted via ** collections.sort (list) or arrays.sort (arr)**. Through the Collections. The sort (list, Collections. ReverseOrder ()) on the list are arranged in reverse chronological order.
-
To implement the Comparator, you need to override the compare method
HashSet
A HashSet is a collection class that stores no duplicate elements, and it is unordered. The internal implementation of HashSet is based on HashMap and implements the Set interface.
As you can see from the constructor provided by HahSet, all but the constructor for the last HashSet is to create a HashMap. There is no other operation. The last constructor is not public, so it is not public.
How does a HashSet check for repetitions
The bottom layer of a HashSet is actually a HashMap, but we implement the Set interface and treat the data as a K value, while the V value is always stored with the same virtual value. The K value of a HashMap itself is not allowed to repeat. And in a HashMap, if K/V is the same, the new V overwrites the old V and returns the old V.
What is the difference between Iterater and ListIterator?
- We can use Iterator to iterate over sets and lists, whereas ListIterator can only iterate over lists
- ListIterator has an add method that adds objects to a List, whereas Iterator does not
- ListIterator and Iterator both have hasNext() and next() methods for backward traversal, whereas ListIterator has hasPrevious() and previous() methods for backward (forward) traversal. The Iterator can’t
- ListIterator can locate the current index, nextIndex() and previousIndex() can do this. Iterator does not do this
- ListIterator can delete objects, but ListIterator and set() can modify objects. The Iterator can only be iterated, but cannot be modified
Reference and thanks
All content is based on the source code reading and a variety of summing up before the knowledge of collation, input and output, Ollie to.
www.javatpoint.com/java-arrayl…
www.runoob.com/java/java-c…
www.javazhiyin.com/21717.html
Yuqirong. Me / 2018/01/31 /…
Youzhixueyuan.com/the-underly…
Detailed analysis of the HashMap source www.tianxiaobo.com/2018/01/18/…
“Analysis of ConcurrentHashMap1.7 source” www.cnblogs.com/chengxiao/p…
www.justdojava.com/2019/12/18/…