Hash collision resolution

  • When it comes to storing key-value pairs, Map sets first come to mind. However, there are generally two ways to solve hash collisions caused by hash algorithms: linked list method and open address method. For Android application development, it corresponds to the solution of HashMap and ArrayMap

ArrayMap: Open address method

  • When the author used Retrofit to request the network, the data needed to be encapsulated as a map set and sent to the back end. However, the amount of data was usually very small, so using ArrayMap instead of HashMap saved memory space. Here is a source code analysis of the difference between the two
  1. First look at the constructors
public ArrayMap() { this(0, false); } public ArrayMap(int capacity) { this(capacity, false); } /** * identityHashCode: Whether to use the System's System. IdentityHashCode (key), which depends on the location of the object, * false uses the object's hashCode method. If overwritten, the object's hashCode method is used (String overwritten, so the map key is the same String). If unoverwritten, the object's hashCode * is used as the same local method as the system's call, resulting in the same value */ public ArrayMap(int capacity, boolean identityHashCode) { mIdentityHashCode = identityHashCode; If (capacity < 0) {mHashes = EMPTY_IMMUTABLE_INTS; if (capacity < 0) {mHashes = EMPTY_IMMUTABLE_INTS; mArray = EmptyArray.OBJECT; } else if (capacity == 0) { mHashes = EmptyArray.INT; mArray = EmptyArray.OBJECT; } else {// Initialize allocArrays(capacity) based on the provided size, but same as the HashMap. } mSize = 0; }Copy the code
  1. Make sense of a few numbers
  • An ArrayMap is made up of two arrays, mHashes and mArray
  • MHashes: an array ordered by the Hash value of the key by size mHashes [key1.hash, key2.hash,….] Key1. hash can be equal to key2.hash, it doesn’t matter.
  • MArray: array of key-value pairs mArray[key1, value1, key2, value2….]
  1. Look at the put() method
@Override public V put(K key, V value) { final int osize = mSize; final int hash; int index; If (key == null) {if (key == null) {if (key == null) {if (key == null) {if (key == null) {if (key == null); index = indexOfNull(); } else {// otherwise find index hash = mIdentityHashCode? System.identityHashCode(key) : key.hashCode(); // The constructor specifies which hash value to use index = indexOf(key, hash); } if (index >= 0) {// Replace the current key with the new value value in mArray and return the old value index = (index<<1) + 1; final V old = (V)mArray[index]; mArray[index] = value; return old; } index = ~index; If (osize >= mhashes.length) {if (osize >= mhashes.length) {if (osize >= mhashes.length) {if (osize >= mhashes.length) { Final int n = osize >= (BASE_SIZE*2)? Final int n = osize >= (BASE_SIZE*2) (osize+(osize>>1)) : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE); if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n); // allocArrays(n) final int[] ohashes = mHashes; final Object[] oarray = mArray; allocArrays(n); If (CONCURRENT_MODIFICATION_EXCEPTIONS && osize!) if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize! = mSize) {/ / for two threads expansion error at the same time, consistent with a HashMap is thread-safe throw new ConcurrentModificationException (); } if (mhashes.length > 0) {if (DEBUG) log. d(TAG, "put: ") {if (mhashes.length > 0) { copy 0-" + osize + " to 0"); System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length); System.arraycopy(oarray, 0, mArray, 0, oarray.length); } // If the size of the array is 4 or 8, it is not necessary to use memory to cache it freeArrays(ohashes, oarray, osize); } if (index < osize) {// if (index < osize) {// if (index < osize) {// If (index < osize) {// If (index < osize) {// If (index < osize) {// If (index < osize) {// If (index < osize) {// If (index < osize) {// If (index < osize) {// If (DEBUG) log.d (TAG, "put: move " + index + "-" + (osize-index) + " to " + (index+1)); / / note: Arraycopy (mHashes, index, mHashes, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy, arrayCopy index + 1, osize - index); System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1); } if (CONCURRENT_MODIFICATION_EXCEPTIONS) { if (osize ! = mSize | | index > = mHashes. Length) {/ / again to determine whether thread-safe throw new ConcurrentModificationException (); } } mHashes[index] = hash; // Insert mHashes and mArray[index<<1] = key; mArray[(index<<1)+1] = value; mSize++; // data value +1 return null; }Copy the code
  1. AllocArrays (Capacity) is used to initialize and cache data from arrays
  • To look at this caching function, you need to understand an array expansion and size change relationship (in the put function BASE_SIZE = 4).
    final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1))
                    : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);
    Copy the code
  • The default value is 4. For the first capacity expansion, the value is 8 (4 x 2 = 8). For each subsequent capacity expansion, the value is 1/2 of the current value. 12 ->18 next time…
private void allocArrays(final int size) { if (mHashes == EMPTY_IMMUTABLE_INTS) { throw new UnsupportedOperationException("ArrayMap is immutable"); } // synchronized if (size == (BASE_SIZE*2)) {synchronized if (BASE_SIZE*2) {synchronized (ArrayMap.class) { if (mTwiceBaseCache ! = null) {// Cache 8 Data in final Object[] array = mTwiceBaseCache; mArray = array; mTwiceBaseCache = (Object[])array[0]; mHashes = (int[])array[1]; array[0] = array[1] = null; mTwiceBaseCacheSize--; if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes + " now have " + mTwiceBaseCacheSize + " entries"); return; } } } else if (size == BASE_SIZE) { synchronized (ArrayMap.class) { if (mBaseCache ! = null) { final Object[] array = mBaseCache; mArray = array; mBaseCache = (Object[])array[0]; mHashes = (int[])array[1]; array[0] = array[1] = null; mBaseCacheSize--; if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes + " now have " + mBaseCacheSize + " entries"); return; } } } mHashes = new int[size]; mArray = new Object[size<<1]; }Copy the code
  1. For finding indexOf(key,hash)
int indexOf(Object key, int hash) { final int N = mSize; // Important fast case: if nothing is in here, nothing to look for. if (N == 0) { return ~0; } // Use binary to find the index of the current hash value in the mHashes array, if there is no return (~ current position to insert) subsequent in ~ is the position to insert (~~A == A) int index = binarySearchHashes(mHashes, N, hash); If (index < 0) {// If not found, return index; } // If (key.equals(mArray[index<<1])) {return index; } // Look up to see if the hash value is the same, otherwise look down to int end; for (end = index + 1; end < N && mHashes[end] == hash; end++) { if (key.equals(mArray[end << 1])) return end; } for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) { if (key.equals(mArray[i << 1])) return i; } // If none is found, return the negative value of the hash value to be inserted. }Copy the code
  1. The remove method:
  • Under certain conditions, memory is reallocated to ensure a reasonable range of memory allocated to ArrayMap and reduce memory usage. However, if each remove reallocates space, it will waste a lot of time.
  • So here, Android uses a space-for-time approach to avoid inefficiency. In any case, frequent allocation of reclaimed memory is time consuming.
public V removeAt(int index) { final Object old = mArray[(index << 1) + 1]; final int osize = mSize; //msize is the actual data size currently stored, and mHashes. Length is the current array size. If (osize <= 1) {// Null if (DEBUG) log. d(TAG, "remove: shrink from "+ mhashes.length +" to 0"); final int[] ohashes = mHashes; final Object[] oarray = mArray; mHashes = EmptyArray.INT; mArray = EmptyArray.OBJECT; freeArrays(ohashes, oarray, osize); nsize = 0; } else { nsize = osize - 1; If (mHashes. Length > (BASE_SIZE*2) &&msize < mHashes. Length /3) {if (BASE_SIZE*2) &&msize < mHashes. Final int n = osize > (BASE_SIZE*2)? (osize + (osize>>1)) : (BASE_SIZE*2); if (DEBUG) Log.d(TAG, "remove: shrink from " + mHashes.length + " to " + n); // Use final int[] ohashes = mHashes; final Object[] oarray = mArray; allocArrays(n); If (CONCURRENT_MODIFICATION_EXCEPTIONS && osize!) if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize! = mSize) {/ / determine whether thread-safe throw new ConcurrentModificationException (); If (index > 0) {if (DEBUG) log. d(TAG, "remove: ") {if (DEBUG) log. d(TAG, "remove: "); copy from 0-" + index + " to 0"); System.arraycopy(ohashes, 0, mHashes, 0, index); System.arraycopy(oarray, 0, mArray, 0, index << 1); } if (index < nsize) {if (DEBUG) log. d(TAG, "remove: copy from " + (index+1) + "-" + nsize + " to " + index); System.arraycopy(ohashes, index + 1, mHashes, index, nsize - index); System.arraycopy(oarray, (index + 1) << 1, mArray, index << 1, (nsize - index) << 1); }} else {if (index < nsize) {if (DEBUG) log. d(TAG, "remove: move " + (index+1) + "-" + nsize + " to " + index); System.arraycopy(mHashes, index + 1, mHashes, index, nsize - index); System.arraycopy(mArray, (index + 1) << 1, mArray, index << 1, (nsize - index) << 1); } mArray[nsize << 1] = null; mArray[(nsize << 1) + 1] = null; } } if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize ! = mSize) { throw new ConcurrentModificationException(); } mSize = nsize; Return (V)old; return (V)old; }Copy the code
  1. FreeArrays (Final int[] hashes, final Object[] array, final int size)And allocArrays(int size)
    • Note that the length of hashes and array is 1/2
private static void freeArrays(final int[] hashes, final Object[] array, Final int size) {if (hashs. length == (BASE_SIZE*2)) {// Synchronized (arraymap.class) {if (hashs. length == (BASE_SIZE*2)) (mTwiceBaseCacheSize < CACHE_SIZE) { array[0] = mTwiceBaseCache; // First array[0] = null, but then a linked list structure will be used. This cache array[0] -> refers to the whole number of the last cache, but the array length is still 16, the maximum cache of 10 array[1] = hashes; for (int i=(size<<1)-1; i>=2; i--) { array[i] = null; } mTwiceBaseCache = array; mTwiceBaseCacheSize++; if (DEBUG) Log.d(TAG, "Storing 2x cache " + array + " now have " + mTwiceBaseCacheSize + " entries"); Synchronized (arraymap.class) {if (hashes. Length == BASE_SIZE) {synchronized (arraymap.class) {if (hashes. Length == BASE_SIZE) (mBaseCacheSize < CACHE_SIZE) { array[0] = mBaseCache; array[1] = hashes; for (int i=(size<<1)-1; i>=2; i--) { array[i] = null; } mBaseCache = array; mBaseCacheSize++; if (DEBUG) Log.d(TAG, "Storing 1x cache " + array + " now have " + mBaseCacheSize + " entries"); }}}}Copy the code
  1. Look at allocArrays again
private void allocArrays(final int size) { if (mHashes == EMPTY_IMMUTABLE_INTS) { throw new UnsupportedOperationException("ArrayMap is immutable"); } if (size == (BASE_SIZE*2)) {synchronized (arraymap.class) {if (mTwiceBaseCache! = null) { final Object[] array = mTwiceBaseCache; mArray = array; mTwiceBaseCache = (Object[])array[0]; mHashes = (int[])array[1]; array[0] = array[1] = null; mTwiceBaseCacheSize--; if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes + " now have " + mTwiceBaseCacheSize + " entries"); return; } } } else if (size == BASE_SIZE) { // If there is data in the current cache, array is a chained structure with a size of 8. It has been added to the cache several times, but it is still assigned to mArray with a size of 8 and will be reassigned later, so there is no need to worry about synchronized multilevel structure (ArrayMap.class) { if (mBaseCache ! = null) { final Object[] array = mBaseCache; mArray = array; mBaseCache = (Object[])array[0]; MHashes = (int[])array[1]; // Assign the current array of 4 to mHashes array[0] = array[1] = null; mBaseCacheSize--; // If (DEBUG) log. d("Retrieving 1x cache "+ mHashes +" now have "+ mBaseCacheSize +" entries"); return; } } } mHashes = new int[size]; mArray = new Object[size<<1]; }Copy the code
  • The cache data graph is shown below
  1. In particular, explain why the maximum cached data is 10 layers:
  • There are a few things you don’t notice: the cache data and their length are static member variables, meaning that they are class-specific and not object-specific, so they are shared by all ArrayMaps. No matter how many ArrayMaps you create, you will share up to 10 cache pools, similar to thread pools
    static Object[] mBaseCache; 
    static int mBaseCacheSize;
    static Object[] mTwiceBaseCache;
    static int mTwiceBaseCacheSize;
    Copy the code

SparseArray

  • Like ArrayMap, it is made up of two arrays, except that it is not a map collection
Public class SparseArray<E> implements Cloneable {private static final Object DELETED = new Object(); Private Boolean mGarbage = false private Boolean mGarbage = false private Boolean mGarbage = false // Whether garbage collection is required, i.e. there is a DELETED value, but it is not necessarily DELETED, and is marked as DELETED private int[] mKeys; // A key corresponds to a value, and the key is an int and cannot be repeated. It can be used in sequence from 0 to N, for example, viewType in RecycleView is ordered and cannot be repeated, and SparseArray private Object[] mValues; private int mSize; Public SparseArray() {// Default keys and values array length is 10 this(10); }Copy the code
  • Store the put
public void put(int key, E value) { int i = ContainerHelpers.binarySearch(mKeys, mSize, key); If (I >= 0) {mValues[I] = value; if (I >= 0) {mValues[I] = value; } else { i = ~i; If (I < mSize && mValues[I] == DELETED) {// If the current location is marked as DELETED, the location is available mKeys[i] = key; mValues[i] = value; return; If (mGarbage && mSize >= mkeys.length) {gc(); / / new gc will sign does delete / / Search again because indices have changed. I = ~ ContainerHelpers. BinarySearch (mKeys, mSize. key); MKeys = GrowingArrayUtils. Insert (mKeys, mSize, I, key); mValues = GrowingArrayUtils.insert(mValues, mSize, i, value); mSize++; }}Copy the code
  • Put the steps:
    1. If the current I position is found by binary search, the original value is replaced by value
    2. If the current I does not exist, the first position greater than I will be found to indicate the insertion position. In this case, there are two situations: 1 is set as Deleted, and the insertion can be directly performed; otherwise, it can be determined whether the array is full and marked as Deleted, and the Deleted position can be Deleted through GC to move the array
    3. If the array is full and there is no Deleted mark, the array needs to be expanded twice before being inserted.
  • The gc is interesting here, as you can see by using a double-pointer move to delete an array marked as Deleted
private void gc() { // Log.e("SparseArray", "gc start with " + mSize); int n = mSize; int o = 0; int[] keys = mKeys; Object[] values = mValues; for (int i = 0; i < n; i++) { Object val = values[i]; if (val ! = DELETED) { if (i ! = o) { keys[o] = keys[i]; values[o] = val; values[i] = null; } o++; } } mGarbage = false; mSize = o; // Log.e("SparseArray", "gc end with " + mSize); }Copy the code
  • The get and delete functions are easy to post and mark as Deleted instead of actually moving the array
public E get(int key, E valueIfKeyNotFound) { int i = ContainerHelpers.binarySearch(mKeys, mSize, key); if (i < 0 || mValues[i] == DELETED) { return valueIfKeyNotFound; } else { return (E) mValues[i]; } } /** * Removes the mapping from the specified key, if there was any. */ public void delete(int key) { int i = ContainerHelpers.binarySearch(mKeys, mSize, key); if (i >= 0) { if (mValues[i] ! = DELETED) { mValues[i] = DELETED; mGarbage = true; }}}Copy the code

HashMap

  • For storage with a large amount of data, it is still recommended to use HashMap, because ArrayMap requires time complexity O(n) when a large amount of data is inserted, while Using HashMap requires the worst time complexity O(logn, both under a red-black tree) and average time complexity O(1), which depends on whether the amount of data is converted into time for space Many algorithm choices are nothing more than a balance between time and space
  1. The data structure is :HashMap is actually a “linked list hash” data structure, that is, the combination of array and linked list;

    • Set the initial capacity and load factor based on the application scenario
    • HashMap implements the Map interface using key-value pairs. Duplicate keys are not allowed in the Map.
    • Map interfaces fall into two categories :TreeMap and HashMap
      • TreeMap stores the ordering of objects, but hashMap does not
      • A HashMap can have empty key-value pairs (null-null) that are non-thread-safe, but can invoke the synchronizeMap() implementation of the Collections static method
    • A HashMap uses a key object to compute a HashCode value
    • A HashMap is fast because it uses unique keys to get objects
  2. Source code analysis:

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { HashMap.Node<K, V>[] tab; HashMap.Node<K, V> p; int n, i; If ((TAB = table) = = null | | (n = TAB. Length) = = 0) / / list is empty or the length is initialized to 0 a HashMap n = (TAB = the resize ()). The length; If ((p = TAB [I = (n-1) & hash]) == null) // No data in current position new a linked list TAB [I] = newNode(hash, key, value, null); Else {// Hashmap. Node<K, V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key ! = null && key.equals(k)))) // has the same hash key, replaces e = p directly; Else if (p instanceof hashmap.treenode) // If (hashmap.treenode <K, V>) p).puttreeval (this, TAB, hash) key, value); Else {// Add for (int binCount = 0; ; ++binCount) {if ((e = p.ext) == null) {p.ext = newNode(hash, key, value, null); /** * If the size of the list is greater than or equal to 8, the red black tree will be converted to the list structure. The average search length is 3. If you continue to use linked lists, the average search length is 8/2=4, which is necessary to convert to a tree. * If the list length is less than or equal to 6, 6/2=3, it is also fast, but the conversion time to tree structure and spanning tree is not too short. * Select 6 and 8, there is a difference value of 7 in the middle can effectively prevent the frequent conversion of linked list and tree, resulting in the continuous conversion of tree and linked list resulting in low efficiency; */ if (binCount >= TREEIFY_THRESHOLD - 1) treeifyBin(tab, hash); / / if break; } if (e.hash == hash && ((k = e.key) == key || (key ! = null && key.equals(k)))) break; p = e; } } if (e ! = null) { // existing mapping for key V oldValue = e.value; if (! onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; If (++size > threshold) resize(); if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }Copy the code
    1. Through the source code, we can get:

      1. When we put an element into a HashMap, we first recalculate the hash value based on the hashCode of the key, and the index of the element in the array based on the hash value.

        1. If there are already other elements in the array, they are stored as a linked list at that location. The newly added elements are placed at the head of the chain, and the last added elements are placed at the end of the chain. If there is a corresponding key in the list, the value is replaced by the latest and the old value is returned.
        2. If there are no other elements at the location, the location is placed directly at the location in the array;
        3. We hope that the positions of elements in the HashMap should be evenly distributed as far as possible, so that there is only one element in each position. In this way, when we use the hash algorithm to get this position, we can immediately know that the element at the corresponding position is the one we want, without iterating through the linked list, which greatly optimizes the query efficiency.
        4. Hash (int h) is a method that computes the hash value, which in theory is modulo % of the array length, but consumes a lot of energy.
        /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); // The length of the underlying array of HashMap is always 2 to the power of n. // When length is always 2 to the n, h& (length-1) is equivalent to modulo length (h%length), but & is more efficient than %. // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; // Increase by 2 each time, so the total length must be 2 to the powerCopy the code
        1. Why the hash code is initialized to the power of 2

          Analysis:

          1. When they and 15 minus 1 (1110), they get the same result, which means they’re going to go to the same place in the array, and you get a collision, and 8 and 9 are going to go to the same place in the array, and you’re going to have to go through the list, and you’re going to get 8 or 9, This reduces the efficiency of the query. We can also see that when the array length is 15, the hash value will “and” 15-1 (1110), so the last digit will always be 0, and 0001,0011,0101,1001,1011,0111,1101 will never be able to store elements. The waste of space is considerable, and worse, the number of places the array can be used is much smaller than the array length, which further increases the chance of collisions and slows down the efficiency of the query!

          2. When the array length is 16, that is, 2 to the power of n, each bit of the binary number obtained by 2n-1 has the value of 1, which makes the low value of & the same as the original hash. In addition, the hash(int h) method has further optimized the hashCode of key, adding the high value calculation. So that only two values with the same hash value will be placed in the same place in the array to form a linked list.

          3. When the length of the array is 2 to the power of n, it is less likely for different keys to calculate the same index, so the data will be evenly distributed in the array, that is to say, there is less chance of collision. Comparatively, when querying, there is no need to traverse the linked list at a certain position, so the query efficiency will be higher

    • Conclusion: When a program attempts to put a key-value pair into a HashMap, it first decides where to store the Entry based on the return value of the key’s hashCode() : If two Entry keys return the same hashCode() value, they are stored in the same location. If the keys of the two entries return true through equals comparison, the value of the new Entry overwrites the value of the original Entry in the collection, but the key does not. If the keys of the two entries return false through equals, the new Entry will form an Entry chain with the original Entry in the collection, And the new Entry is at the head of the Entry chain — see the addEntry() method for details.

      1. Expansion mechanism of HashMap

        1. When to expand: reach the threshold, array size * load factor default 16* 0.75 = 12
        2. JDK If the value of the linked list is greater than or equal to 8, it will be converted to the linked list. If the size of the red black tree is less than or equal to 6, it will be converted to the linked list. Set a transition value 7 to prevent continuous put and remove from converting Low efficiency;
        if (loHead ! = null) { if (lc <= UNTREEIFY_THRESHOLD) //6 tab[index] = loHead.untreeify(map); Else {TAB [index] = loHead; if (hiHead ! = null) // (else is already treeified) loHead.treeify(tab); }}Copy the code

    Inductive HashMap

    1. Simply put, a HashMap, at the bottom, treats key-value as a whole, which is an Entry object. At the bottom of HashMap, an Entry[] array is used to store all key-value pairs. When an Entry object needs to be stored, its storage position in the array is determined by the Hash algorithm, and its storage position in the linked list is determined by the equals method. When an Entry needs to be retrieved, the hash algorithm is used to find its storage location in the array, and the equals method is used to retrieve the Entry from the linked list at that location.

    2. The default value of HashMap initialization is 16. Initial capacity can be specified during initialization. If the initial capacity is not raised to the power of 2, the HashMap selects the first value greater than the power of 2n as its initial length.

    3. The initial load factor is 0.75, and if the number of elements exceeds 16 * 0.75 = 12, the array is doubled to 32, and the position of each element in the array is recalculated. This is a very expensive operation. If we already know the number of elements in the HashMap, The number of preset elements can improve the performance of a HashMap. If the load print is larger, the space will be fully utilized, but the query efficiency will be reduced. If the load factor is smaller, the hash table will be too sparse, resulting in a waste of space (time <-> space conversion: 0.75 optimal).

    4. Detection mechanism in multithreading: Fail-fast, which marks the number of changes in the modCount field (using volatile to ensure visibility of changes between threads), assigns values during iteration initialization, and determines whether they are equal at each subsequent next

      HashIterator() { expectedModCount = modCount; if (size > 0) { // advance to first entry Entry[] t = table; while (index < t.length && (next = t[index++]) == null) ; } } final Entry<K,V> nextEntry() { if (modCount ! = expectedModCount) throw new ConcurrentModificationException();Copy the code
    5. It is recommended to use concurrent HashMap in multithreading

    6. Traversal of Map,

      1. Use entrySet to get an Entry collection of key-value pairs, traversed only once

      2. Use keySet to iterate over all keys first, and then retrieve get(key) by key twice

      3. ForEach () traverses the hash table first, and if it is a linked list, traverses the next hash list

      public final void forEach(Consumer<? super Map.Entry<K,V>> action) { Node<K,V>[] tab; if (action == null) throw new NullPointerException(); if (size > 0 && (tab = table) ! = null) { int mc = modCount; // Android-changed: Detect changes to modCount early. for (int i = 0; (i < tab.length && modCount == mc); ++i) { for (Node<K,V> e = tab[i]; e ! = null; e = e.next) action.accept(e); } if (modCount ! = mc) throw new ConcurrentModificationException(); } } Map map = new HashMap();Copy the code

    The Iterator iter = map. EntrySet (). The Iterator (); While (iter.hasnext ()) {map.entry = (map.entry) iter.next(); The Object key = entry. GetKey (); Object val = entry. The getValue (); }

    Copy the code

HashMap header expansion causes an infinite loop analysis

  • The expansion of HashMap prior to JDK1.7 was to add elements from the header, resulting in a circular data structure formed by the PUT operation in a multi-threaded environment, and the next node would never be null, resulting in an endless loop of getting entries
  • In JDK1.7, the capacity expansion function is:
void transfer(Entry[] newTable) { Entry[] src = table; // Old hash table int newCapacity = newtable. length; // Take an element from OldTable and place it in NewTable for (int j = 0; j < src.length; j++) { Entry<K,V> e = src[j]; if (e ! = null) { src[j] = null; do { Entry<K,V> next = e.next; E = 3, next = 7, 3. Next = 7 int I = indexFor(e. dash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; } while (e ! = null); }}}Copy the code
  • If thread A and thread B expand HashMap respectively
    1. For the above expansion function, when two threads A and B run to Entry

      next = e.next,
      ,v>
    2. For 3 ->7 thread A in the old Hash table where e =3, e.next = next = 7, 3 points to 7, and at this point A hangs,
    3. At this time, B starts to run normally. At this time, 3->7 is added to the new table first, and 7 is added to the front of 3. At this time, 7->3 is added after running, that is, 7. Next = 3, 3
    4. However, remember that thread A was suspended when thread A was 3->7, if we start to run the following code: e.ext = newTable[I]; E = 3, newTable[I] has been changed to 7 by thread B, so 3. Next = 7 replaces 3. Next = null 7. Next = 3, note that there is no change to 7.next, so put goes into an infinite loop
    5. To prevent this JDK1.8 optimization from adding elements from the tail, there is no guarantee that multithreading will be overwritten!

Hashtable(default size 11, load factor 0.75)

  • An overview of the

    Like HashMap,Hashtable is a Hashtable that stores key-value pairs,implements Map<>

  • Member variables are defined as table, count, threshold, loadFactor, modCount

    • Table is an Entry[] array type,Entry is a one-way linked list, and key-value pairs are stored in Entry
    • Count is the size of the Hashtable, and is the number of key-value pairs to hold
    • Threshold indicates whether to expand capacity = Capacity x load factor
    • LoadFactor loadFactor
    • ModCount is used to implement fail-fast synchronization
  • Put method

    1. If value is null, an exception is thrown.
    2. If the table[index] element is not empty, iterate. If the same key is encountered, replace it directly and return the old value
    3. Otherwise, insert it into the table[index] position
    public synchronized V put(K key, V value) {// Make sure the value is not null if (value == null) {throw new NullPointerException(); } // Makes sure the key is not already in the hashtable. // Make sure the key is not in the hashtable. // Next, iterate over the list of index positions. If the list at that position has the same key, replace value and return the old value Entry TAB [] = table; int hash = hash(key); int index = (hash & 0x7FFFFFFF) % tab.length; for (Entry<K,V> e = tab[index] ; e ! = null ; e = e.next) { if ((e.hash == hash) && e.key.equals(key)) { V old = e.value; e.value = value; return old; } } modCount++; If (count >= threshold) {// Rehash the table if the threshold is exceeded // If the threshold is exceeded, Rehash (); tab = table; hash = hash(key); index = (hash & 0x7FFFFFFF) % tab.length; } // Creates a new entry. // Creates a new entry. // Create a new Entry node, insert the new Entry into the index position of the Hashtable, and set e to the next element of the new Entry TAB [index] = new Entry<>(hash, key, value, e); count++; return null; }Copy the code

Comparison of Hashtable and HashMap

  1. HashMap is thread-safe, HashTable is thread-safe; Methods inside a HashTable are basically synchronized modified. (Use ConcurrentHashMap if you want to be thread-safe!) ;

    1. HashMap is a little more efficient than HashTable because of thread-safety issues. Also, HashTable is largely obsolete, so don’t use it in your code;
    2. Support for Null keys and Null values: In a HashMap, Null can be used as a key. There is only one such key, and there can be one or more keys corresponding to the value Null. A NullPointerException is thrown whenever a key value in a HashTable contains a NULL.
    3. (1) If you do not specify an initial capacity when creating a Hashtable, the default initial capacity of the Hashtable is 11. After each expansion, the original capacity is 2N +1. The default initialization size of a HashMap is 16. After each expansion, the capacity is doubled. (2) If you create a Hashtable with an initial capacity, then the Hashtable will use the given size, and the HashMap will expand it to a power of 2. That is, a HashMap always uses a power of 2 as the size of the hash table.
    4. Underlying data structure: HashMap since JDK1.8 has a major change in resolving hash collisions, converting lists to red-black trees to reduce search time when the list length is greater than the threshold (8 by default). Hashtable has no such mechanism.
HashMap JDk1.8 red-black tree
  • First, linked lists are easy to maintain but not conducive to finding O(n). Red-black trees are a special binary tree, which is conducive to finding (binary search O(logn)), but the corresponding maintenance is more complex (left-rotation, right-rotation and node color change are required).
  • Therefore, for hash collisions with the number less than or equal to 6, a linked list is used to store them. For hash collisions with the number greater than or equal to 8, a red-black tree is optimized to store them. It is faster to add, delete, modify, and search
    • The red-black tree was chosen to solve the problem of binary lookup trees, which in special cases can become a linear structure (as with the original linked list structure, causing deep problems), and traversal lookup can be very slow. And red and black tree after inserting new data may need to be by a left-handed, right-handed, color changing these operations to maintain a balance, the introduction of a red-black tree is to find data quickly, solve the problems of the chain table query depth, we know that the red-black tree to balanced binary tree, but in order to maintain the “balance” is the need to pay the price, but the price is the depletion of the resources less than traverse linear list, So when the length is greater than 8, we use a red-black tree, and if the list length is less than 6, we switch the red-black tree back to the list, because we don’t need to introduce a red-black tree at all, it’s slow.
Red and black tree
  • ** The hash value is consistent in the red-black tree, which compares values by comporeTo() :** For example, for common String keys, the same hash value is based on native CompareTo () compares two string differences to calculate the size of the red-black tree, so there are no identical values in the red-black tree. If they are identical, they are stored in the same key (equals method is the same) and need to be replaced
  • A brief description of what a red-black tree is:The characteristics of
    1. The root node is black
    2. A child node and its parent cannot both be red: that is, all children of a red node are black
    3. The same number of black nodes pass through from the root node to any leaf node
    4. All of the empty leaf nodes are black, and the leaf node here is the null node, usually NIL, usually you omit it, but you need to complete it when you use it
    • Note: due to the characteristics of 2, 3 to ensure no one path is two times longer than the other path, so it is a relatively balanced binary tree, compared with the ordinary binary tree may degenerate into a linked list is more outstanding, at the same time relative to the complete binary tree (maximum and minimum worse the deepest node 1 and most are located on the left), balanced binary tree (left subtree with the absolute value of the right subtree depth difference cannot be super Over 1, and the left and right subtrees are also balanced binary trees) requires fewer transformation steps, so relatively balanced results can be achieved by changing data less, and the optimal Hash conflict resolution is nothing more than space for time operation
  • Left-handed versus right-handed understanding

  • Analysis:
    1. To turn L left is to set the right node of L to be the parent node of X, that is, to turn L into a left node ==> Left in left is to turn the rotating node into a left node
    2. To turn P to the right is to set the left node of P to be the parent of P, that is, to make P a right node ==> Right in right is to turn the rotating node into a right node
Search time complexity O(logN)
  • For red-black tree lookups
insert
  • Note:
    1. The insertion nodes in red-black trees are the lowest leaf nodes
    2. Insert the node color to red, not black because according to the characteristics of 3, any leaf node after black nodes have the same number, if it is black which will change the number of leads to different and needs to be left-handed right-handed excess operation such as change the color, if it is red may don’t need to do any operation to balance
  • Assume that the node to be inserted is N, its parent node is P, its grandfather node is G, and U is the uncle node of N (the brother node of P), then the red and black nodes can be inserted in the following situations:
    1. N is the root node, the first node in a red-black tree
    2. The parent node P of N is black
    3. P is red, not the root (the root must be black), and its sibling U is also red
    4. P is red and U is black
  • For cases 1,2, and 3 above, they don’t involve rotation, just color flipping
    1. So case 1, case 2 doesn’t affect the red-black tree in the sense that it doesn’t disturb the equilibrium, it just inserts
    2. For 3, both P and U are red, so you can change color directly. P and U become black, and G becomes red. If G is the root node, it becomes black directly
  • For case 4,P is red,U is black, which can be divided into four cases:
    1. P is the left child of G. If N is the left child of P, then the grandparent G can be dextrally rotated
    2. P is the left child of G, if N is the right child of P, then P is rotated left first (the case is 1 after left turn), and then the grandfather node G is rotated right
    3. P is the right child of G. If N is the right child of P, the grandparent G can be rotated left
    4. P is the right child of G, if N is the left child of P, then P is rotated right first (the case is 3 after the right rotation), and then the grandfather node G is rotated left
  • Since the addition can only be a leaf node, it can only be in these four cases. Note that the black U may be for the Nul node
delete
  • For a delete operation, there are only four cases:
    1. The left and right subtrees of the deleted node are not Nul.
    2. The left subtree of the deleted node is Nul, and the right subtree is non-empty.
    3. The right subtree of the deleted node is Nul, and the left subtree is non-empty.
    4. The left and right subtrees of the deleted node are Nul.

In the TreeMap source code, you can see the above four case analysis: for 1, deleting a node ends up deleting a non-empty leaf node

private void deleteEntry(TreeMapEntry<K,V> p) { modCount++; size--; // If strictly internal, copy successor's element to p and then make p // point to successor. if (p.left ! = null && p.right ! Succeeded (p) {// If the left and right subtrees are not Nul TreeMapEntry<K,V> s = antecedent (p); p.key = s.key; p.value = s.value; p = s; } // Start fixup at replacement node, if it exists. TreeMapEntry<K,V> replacement = (p.left ! = null ? p.left : p.right); if (replacement ! // Link replacement to parent replacement. Parent = p.parent; if (p.parent == null) root = replacement; else if (p == p.parent.left) p.parent.left = replacement; else p.parent.right = replacement; // Null out links so they are OK to use by fixAfterDeletion. p.left = p.right = p.parent = null; // Fix replacement if (p.color == BLACK) fixAfterDeletion(replacement); } else if (p.parent == null) { // return if we are the only node. root = null; } else {Nul if (p.color == BLACK) fixAfterDeletion(p); if (p.parent ! = null) { if (p == p.parent.left) p.parent.left = null; else if (p == p.parent.right) p.parent.right = null; p.parent = null; }}}Copy the code
  • The above can be viewed in the source code algorithm implementation details, if there are improper, welcome to give comments correction!