A HASHMAP extension

Before reading the Effective Java summed up slightly HashMap: http://www.jianshu.com/p/24c10ce29c85 write this reason:

  • After reading the HashMap comics, I wanted to reorganize some of my knowledge about hashmaps.
  • On November 29, 2017, we used LRUCache in our project, which happened to have this problem in LeetCode, so I went to do it. After thinking about various data structures successively, I finally came up with the idea of using LinkedHashMap as an ordered Map. So check out LinkedHashMap.

0x00 HASH method and method for resolving HASH conflicts

I had memorized various nouns and drawn and calculated various graphs and formulas for resolving hash conflicts when reviewing data structures in the postgraduate entrance examination. However, I did not understand HashMap at that time, because I had no development experience at that time. So a little confused now, I will sort out all kinds of nouns: # # # # hash method of hash method, also is the key of hashcode assigned to each index of the table on the slot method are:

(1) Direct addressing method: for example, in a table of 0 ~ 100 years old, we can use the age as the address. (2) Square middle method (3) Division residual method after the keyword is divided by a number p not larger than the length of the hash table m, the remainder is the hash address. The key of this method is to select P. The p chosen should be such that the hash function value is as relevant as possible to the bits of the keyword. P had better be prime. (4) random number method (5) modulus method

Java HashMap uses the well-known & method, similar to (5) but more efficient:

index = HashCode(Key) & (Length – 1)

conflict-resolvingmethods:

Because the hashCode of the key is the same, or the index hash of the hashCode is the same, you need to put it in a bucket.

(1) The zipper method is the most impressive. The zipper method is the method used in Java HashMap, that is, the “head insertion method”. The element added after the zipper is put in the first place, because the designer thinks that the element added after is more likely to be accessed, so it is efficient to traverse the linked list in this way. I originally thought this would be an O(1) operation, but it’s not, because you first need to determine if the bucket has a hash for the key, and if so, overwrite the value of the hash. A bucket is a hash of a key, and a value is an appendage of a hash.

0 x01 PUT the point

A brief description of the PUT process:

Hash key hashCode(), then evaluate index; If you don’t hit it, you put it in the bucket; If they collide, they should be stored behind buckets in a linked list. If the collision causes the list to be too long (greater than or equal to TREEIFY_THRESHOLD), the list is converted to a red-black tree; If the node already exists, replace the old value(to ensure that the key is unique). If the bucket is full (exceeding the load factor*current capacity), resize.

Notice a couple of points,

  1. The general hash method is modular, and the zip method is used to resolve hash conflicts when the collision (hash conflict, collision) occurs.
  2. A hash collision results in a bucket whose list length is greater than TREEIFY_THRESHOLD being converted to a red-black tree, and the put/get complexity becomes O(logn).
  3. If the default capacity is 16 and capacity is 0.75, this will automatically resize when all 12 buckets have elements, making it easier to find things in one bucket.

Little Grey Comics: What HashMap mentions in some detail:

  1. When you get, the post-inserted node comes before the Entry (header method). Because the creators of HashMap believe that post-inserted entries are more likely to be found. Note that since the Entry is a linked list, it is ordered, which is different from the bucket arrangement.

  2. The initial length of the HashMap is 16(1111B). The length is a power of 2 because the HashMap Hash is not done by mod(length-1), but by & (length-1), which is known to be more efficient than %. And if you use ampersand, of course, it’s best if the low position is 1, so the probability of Hash Collision is minimal.

0x02 Circular linked list generated by HashMap with high concurrency

Grey Comics: HashMap with high concurrency

  1. If too many elements are inserted into a Hashmap, Resize is required. The condition for Resize is hashmap. Size >= Capacity * LoadFactor.

  2. The Resize of a Hashmap consists of two steps: expansion and ReHash, which may form a linked list loop in the case of concurrency.

Get a key that does not exist after Resize.

0x03 ConcurrentHashMap Solves the thread safety problem of HashMap

Gray Comics: ConcurrentHashMap

Use a HashTable or Collections. SynchronizedMap is thread-safe (HashTable and HashMap is another difference between HashTable is not allowed to empty the key – value). But they both lock the entire collection, blocking other operations at the same time (not put and get, we know that HashMap is not secure because of the loop generated by put and rehash).

ConCurrentHashMap uses segments. Each Segment can be read and written independently without affecting each other. Case1: write concurrently to different segments Case2: write concurrently to the same Segment Case2: write concurrently to the same Segment Case2: write concurrently to the same Segment Lock only if put is concurrent with the same Segment:

Note also that the comics are based on Java1.7. In 1.8, as mentioned above, a hash collision causes a bucket to be converted to a red-black tree if its list length is greater than TREEIFY_THRESHOLD, and the complexity of put/get changes to O(logn).

0 x04 LinkedHashMap LruCache

First of all, how can LinkedHashMap implement sequential storage of entries? We know that HashMap uses the Hash algorithm of the power of & 2 to Hash the key into the table, so its bucket(Entry) order is out of order. LinkedHashMap can’t keep order without hash, it wouldn’t be a HashMap, hash itself exists for conflict resolution -> fast access, if stored sequentially, it would be something like an ArrayList. Why can’t a HashMap store only one bucket of data like an ArrayList? Why is the lookup of an ArrayList only traversal, not O(1) like a Map? Whenever in doubt, look at this Hash formula:

index = HashCode(Key) & (Length – 1)

  1. The reason why the bucket can be accurately located is that the bucket stores the index obtained by the hash operation of the key.
  2. The reason why a bucket cannot store a single data is that the Hash algorithms always conflict. There are two possibilities: (1) different keys with the same hashCode; (2) different hashCode with the same index, such as the default Capacity16. 1011010101110 100&1111 index = 1001B = 9

To implement LruCache, move to LeetCode -146. LRU Cache.