This series of articles have been supplemented and improved, and has been revised and organized into a book, The Logic of Java Programming (written by Ma Junchang), published by Huazhang Branch of China Machine Press, which was on the market in January 2018 and received high praise from readers. Major online shops and bookstores are available for sale, welcome to buy: JINGdong self-run link
The previous two sections introduced ArrayList and LinkedList. One common feature of them is that they are inefficient in searching for elements and need to be compared one by one. This section introduces HashMap, which is much more efficient in searching. How does it work? How is it done? This section provides details.
Literally, a HashMap consists of two words, Hash and Map. Map is not a Map, but a mapping relationship. It is an interface.
Next, we’ll look at the Map interface, then how to use HashMap, then how to implement it, and finally we’ll summarize the features of HashMap.
The Map interface
The basic concept
A Map has the concepts of keys and values. A Map stores and accesses values based on keys. Keys cannot be duplicated, that is, only one copy of a key is stored. Using maps makes it easy to handle scenarios where objects need to be accessed by key, such as:
- A dictionary application, the key can be a word, the value can be word information class, including meaning, pronunciation, example sentences and so on.
- Count and record the number of occurrences of all words in a book. You can use words as keys and occurrences as values.
- Manage configuration items in a configuration file. Configuration items are typical key-value pairs.
- Query personnel information according to id number, ID number is the key, personnel information is the value.
Arrays, ArrayLists, and LinkedLists can be considered special maps with keys as indexes and values as objects.
The interface definition
The Map interface is defined as:
public interface Map<K.V> {
V put(K key, V value);
V get(Object key);
V remove(Object key);
int size(a);
boolean isEmpty(a);
boolean containsKey(Object key);
boolean containsValue(Object value);
void putAll(Map<? extends K, ? extends V> m);
void clear(a);
Set<K> keySet(a);
Collection<V> values(a);
Set<Map.Entry<K, V>> entrySet();
interface Entry<K.V> {
K getKey(a);
V getValue(a);
V setValue(V value);
boolean equals(Object o);
int hashCode(a);
}
boolean equals(Object o);
int hashCode(a);
}
Copy the code
The Map interface has two type parameters, K and V, which represent the types of Key and Value respectively. Let’s explain the methods.
Save key-value pairs
V put(K key, V value);
Copy the code
Key saves the value value. If a key already exists in the Map, the Map overwrites the corresponding value and returns the original value. If no key exists in the Map, the Map returns null. Keys are identical in that they are either null or the equals method returns true.
Get values based on keys
V get(Object key);
Copy the code
If not found, null is returned.
Delete key-value pairs by key
V remove(Object key);
Copy the code
Returns the original value of the key, or null if no key exists in the Map.
View the size of the Map
int size(a);
boolean isEmpty(a);
Copy the code
Check to see if a key is included
boolean containsKey(Object key);
Copy the code
Check to see if a value is included
boolean containsValue(Object value);
Copy the code
Batch saving
void putAll(Map<? extends K, ? extends V> m);
Copy the code
Saves all key-value pairs in parameter m to the current Map.
Clear all key-value pairs in the Map
void clear(a);
Copy the code
Gets the collection of keys in the Map
Set<K> keySet(a);
Copy the code
A Set is an interface that represents the mathematical concept of a Set, that is, a Set of elements that are not repeated. It is defined as:
public interface Set<E> extends Collection<E> {}Copy the code
It extends Collection without defining any new methods, but it requires all implementers to ensure that the Set has semantic constraints, i.e., no repeating elements. We’ll talk more about Set in the next section.
There are no duplicate keys in the Map, so ketSet() returns a Set.
Gets the collection of all values in the Map
Collection<V> values(a);
Copy the code
Gets all key-value pairs in the Map
Set<Map.Entry<K, V>> entrySet();
Copy the code
Map.Entry
is a nested interface defined inside the Map interface and represents a key-value pair. The main methods are as follows:
K getey(a);
V getValue(a);
Copy the code
KeySet ()/values()/entrySet() have a common feature. They all return views, not copied values. Changes based on the returned values directly modify the Map itself, for example:
map.keySet().clear();
Copy the code
All key-value pairs are deleted.
HashMap
Using the example
HashMap implements the Map interface. Let’s take a look at how to use it with a simple example.
In the random section, we introduced how to generate random numbers. Now, let’s write a program to see if random numbers are uniform. For example, we can randomly generate 1000 numbers from 0 to 3 and count the number of times each number is generated. The code could be written like this:
Random rnd = new Random();
Map<Integer, Integer> countMap = new HashMap<>();
for(int i=0; i<1000; i++){
int num = rnd.nextInt(4);
Integer count = countMap.get(num);
if(count==null){
countMap.put(num, 1);
}else{
countMap.put(num, count+1); }}for(Map.Entry<Integer, Integer> kv : countMap.entrySet()){
System.out.println(kv.getKey()+","+kv.getValue());
}
Copy the code
The output of a run is:
0269 1236 2261 3234Copy the code
The code is pretty simple, so I won’t explain it.
A constructor
In addition to the default constructor, HashMap has the following constructor:
public HashMap(int initialCapacity)
public HashMap(int initialCapacity, float loadFactor)
public HashMap(Map<? extends K, ? extends V> m)
Copy the code
The last one is constructed with an existing Map and copies all the key-value pairs into the current Map, which is easy to understand. The first two involve two two-parameter initialCapacity and loadFactor. What do they mean? We need to take a look at how HashMap works (based on JDK 7).
Realize the principle of
An internal
Inside a HashMap there are several main instance variables:
transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;
transient int size;
int threshold;
final float loadFactor;
Copy the code
Size indicates the actual number of key-value pairs.
Table is an array of Entry types, where each element points to a one-way linked list. Each node in the linked list represents a key-value pair. Entry is an inner class, whose instance variables and constructor code are as follows:
static class Entry<K.V> implements Map.Entry<K.V> {
final K key;
V value;
Entry<K,V> next;
int hash;
Entry(inth, K k, V v, Entry<K,V> n) { value = v; next = n; key = k; hash = h; }}Copy the code
Where key and value represent the key and value respectively, next refers to the next Entry node, hash is the hash value of the key, and we will introduce its calculation method in a moment. Directly storing hash value is to speed up the calculation during comparison, and we will look at the code in a moment.
The initial value of table is EMPTY_TABLE, which is an empty table.
static finalEntry<? ,? >[] EMPTY_TABLE = {};Copy the code
When a key/value pair is added, the table is no longer empty and will be expanded as the key/value pair is added. The expansion strategy is similar to that of an ArrayList. When the first element is added, the default size allocated is 16.
Threshold indicates the threshold. If the number of key-value pairs size is greater than or equal to threshold, the value can be extended. How does threshold work out? In general, threshold is equal to table.length multiplied by loadFactor. For example, if table.length is 16 and loadFactor is 0.75, threshold is 12.
LoadFactor is a loadFactor that indicates how much of the table as a whole is occupied. LoadFactor is a floating point number that defaults to 0.75 and can be modified using the constructor.
Let’s take a look at some of the main method code to see how HashMap uses this internal data to implement the Map interface. Look at the default constructor first. It should be noted that we may omit some non-major code for clarity and simplicity.
Default constructor
The code is:
public HashMap(a) {
this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR);
}
Copy the code
DEFAULT_INITIAL_CAPACITY is 16 and DEFAULT_LOAD_FACTOR is 0.75. The main constructor code for the default constructor call is:
public HashMap(int initialCapacity, float loadFactor) {
this.loadFactor = loadFactor;
threshold = initialCapacity;
}
Copy the code
Set the initial values of loadFactor and threshold.
Save key-value pairs
Let’s see how HashMap stores a key-value pair as follows:
public V put(K key, V value) {
if (table == EMPTY_TABLE) {
inflateTable(threshold);
}
if (key == null)
return putForNullKey(value);
int hash = hash(key);
int i = indexFor(hash, table.length);
for(Entry<K,V> e = table[i]; e ! =null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
Copy the code
If it is saved for the first time, the inflateTable() method is called to allocate actual space to the table. The main code of inflateTable is as follows:
private void inflateTable(int toSize) {
// Find a power of 2 >= toSize
int capacity = roundUpToPowerOf2(toSize);
threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
table = new Entry[capacity];
}
Copy the code
By default, the value of capacity is 16, the threshold changes to 12, and the table allocates an Entry array of 16 lengths.
Next, check to see if the key is null, and if so, call putForNullKey alone, which we’ll ignore for the moment.
If the key is not null, the hash method is used to calculate the hash value of the key. The code for the hash method is as follows:
final int hash(Object k) {
int h = 0
h ^= k.hashCode();
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Copy the code
Based on the return value of the Key’s own hashCode method, some bit operations are performed for the sake of randomness and uniformity.
Once you have the hash value, call indexFor to calculate where to place the key-value pair in the table as follows:
static int indexFor(int h, int length) {
return h & (length-1);
}
Copy the code
In HashMap, length is a power of 2, and h&(length-1) is equivalent to the modulo operation: h%length.
Table [I] = table[I] = table[I] = table[I] = table[I]
for(Entry<K,V> e = table[i]; e ! =null; e = e.next)
Copy the code
Equals equals equals equals equals equals equals equals equals equals equals equals equals
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
Copy the code
Why compare hash in the first place? Since hash is an integer, the comparison performance is generally much higher than equals, and unlike hash, there is no need to call equals, which improves comparison performance overall.
If it can be found, directly change the value in the Entry.
ModCount++, as described in ArrayList and LinkedList, records the number of changes to facilitate detection of structural changes during iterations.
If none is found, the addEntry method is called to add an entry at the given location with the code:
void addEntry(int hash, K key, V value, int bucketIndex) {
if ((size >= threshold) && (null! = table[bucketIndex])) { resize(2 * table.length);
hash = (null! = key) ? hash(key) :0;
bucketIndex = indexFor(hash, table.length);
}
createEntry(hash, key, value, bucketIndex);
}
Copy the code
If the space is sufficient and resize is not required, call createEntry to add the space.
void createEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<>(hash, key, value, e);
size++;
}
Copy the code
The code is straightforward: create an Entry object, insert the head of the one-way list, and increase size.
If the space is insufficient, that is, size is about to exceed threshold and objects have been inserted into the corresponding table position, the specific check code is as follows:
if ((size >= threshold) && (null! = table[bucketIndex]))Copy the code
Then call the resize method to expand the table. The expansion strategy is multiplied by 2. The main code of resize is as follows:
void resize(int newCapacity) {
Entry[] oldTable = table;
int oldCapacity = oldTable.length;
Entry[] newTable = new Entry[newCapacity];
transfer(newTable, initHashSeedAsNeeded(newCapacity));
table = newTable;
threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
}
Copy the code
Allocate an Entry array with twice the size, call the Transfer method to migrate the original key-value pair, and then update the internal table variable and the value of threshold. The code of transfer method is:
void transfer(Entry[] newTable, boolean rehash) {
int newCapacity = newTable.length;
for (Entry<K,V> e : table) {
while(null! = e) { Entry<K,V> next = e.next;if (rehash) {
e.hash = null == e.key ? 0 : hash(e.key);
}
inti = indexFor(e.hash, newCapacity); e.next = newTable[i]; newTable[i] = e; e = next; }}}Copy the code
The rehash argument is generally false. This code iterates over each of the original key-value pairs, computes the new location, and saves it to the new location.
The above is the main code for saving key-value pairs. Briefly summarize the basic steps as follows:
- Compute the hash value of the key
- According to the hash value to save position (mold)
- Insert into the head of the list at the corresponding position or update an existing value
- Expand the table size as needed
The above description may be abstract, but let’s take a look at it using an example and a diagram. The code is:
Map<String,Integer> countMap = new HashMap<>();
countMap.put("hello".1);
countMap.put("world".3);
countMap.put("position".4);
Copy the code
After creating an object with new HashMap(), the graphic structure in memory looks something like this:
countMap.put("hello".1);
Copy the code
The hash value of “hello” is 96207088 and the result of module 16 is 0, so insert the head of the linked list pointed to by table[0] and the memory structure will change to:
Get values based on keys
The code is:
public V get(Object key) {
if (key == null)
return getForNullKey();
Entry<K,V> entry = getEntry(key);
return null == entry ? null : entry.getValue();
}
Copy the code
GetForNullKey (); getEntry(); getForNullKey(); The node’s getValue() method is then called to get the value. The code for the getEntry method is:
final Entry<K,V> getEntry(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null)?0 : hash(key);
for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
e = e.next) {
Object k;
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
return e;
}
return null;
}
Copy the code
The logic is simple:
- Compute the hash value of the key as follows:
int hash = (key == null)?0 : hash(key);
Copy the code
- Find the corresponding linked list in the table according to the hash code:
table[indexFor(hash, table.length)];
Copy the code
- Iterating through the list to find, iterating through the code:
for(Entry<K,V> e = table[indexFor(hash, table.length)]; e ! =null;
e = e.next)
Copy the code
- Compare one by one, using hash to compare quickly, using equals to compare the same hash.
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k))))
Copy the code
Check to see if a key is included
ContainsKey has the same logic as get. If the node is not null, it exists.
public boolean containsKey(Object key) {
returngetEntry(key) ! =null;
}
Copy the code
Check to see if a value is included
A HashMap can be used to efficiently manipulate by key, but if you want to manipulate by value, you need to iterate. The containsValue method has the following code:
public boolean containsValue(Object value) {
if (value == null)
return containsNullValue();
Entry[] tab = table;
for (int i = 0; i < tab.length ; i++)
for(Entry e = tab[i] ; e ! =null ; e = e.next)
if (value.equals(e.value))
return true;
return false;
}
Copy the code
If the value is null, containsNullValue is called separately. If the value is not null, containsNullValue is called separately. If the value is not null, containsNullValue is called separately.
Delete key-value pairs by key
The code is:
public V remove(Object key) {
Entry<K,V> e = removeEntryForKey(key);
return (e == null ? null : e.value);
}
Copy the code
The code for removeEntryForKey is:
final Entry<K,V> removeEntryForKey(Object key) {
if (size == 0) {
return null;
}
int hash = (key == null)?0 : hash(key);
int i = indexFor(hash, table.length);
Entry<K,V> prev = table[i];
Entry<K,V> e = prev;
while(e ! =null) {
Entry<K,V> next = e.next;
Object k;
if(e.hash == hash && ((k = e.key) == key || (key ! =null && key.equals(k)))) {
modCount++;
size--;
if (prev == e)
table[i] = next;
else
prev.next = next;
e.recordRemoval(this);
return e;
}
prev = e;
e = next;
}
return e;
}
Copy the code
The basic logic is:
- Calculate the hash and find the table index based on the hash. The code is as follows:
int hash = (key == null)?0 : hash(key);
int i = indexFor(hash, table.length);
Copy the code
- Traverse the table[I] to find the point to be deleted, and use the variable prev to point to the previous node, next to point to the next node, and e to point to the current node. The traversal structure code is as follows:
Entry<K,V> prev = table[i];
Entry<K,V> e = prev;
while(e ! = null) { Entry<K,V> next = e.next;if{// deletereturn;
}
prev = e;
e = next;
}
Copy the code
-
To determine whether or not we find the hash, we still compare the hash to equals
-
The logic of deleting is to make the length smaller and then connect the nodes before and after the point to be deleted. If the point to be deleted is the first node, then make the table[I] point directly to the next node. The code is:
size--;
if (prev == e)
table[i] = next;
else
prev.next = next;
Copy the code
e.recordRemoval(this); The code is empty in HashMap, mainly for use by subclass extension of HashMap.
Summary of implementation principle
The above is the basic implementation principle of HashMap. There is an array table inside, and each element table[I] points to a one-way linked list. According to the key memory value, hash is calculated with the key, and the index position in the array buketIndex is obtained by modulo. Then operate on the one-way linked list pointed to by table[buketIndex].
Hash () = equals (); hash () = equals (); hash () = equals (); hash () = equals (); We need to pay special attention to this point. This is also a key constraint on the hashCode and Equals methods, a constraint we discussed when we introduced wrapping classes.
Analysis of HashMap features
HashMap implements the Map interface, internally using array linked list and hash, which determines its characteristics as follows:
- The efficiency of saving and obtaining values by key is very high, which is O(1). Each one-way linked list usually has only one or a few nodes, which can be directly and quickly located according to the hash value.
- There is no order in key-value pairs in a HashMap because the hash values are random.
A HashMap is ideal if you often need to value by key, and you don’t need the order.
summary
This section introduces the usage and implementation principle of HashMap. It realizes the Map interface and can conveniently store values by keys. Its implementation uses hash and can directly locate according to the keys themselves, with high access efficiency.
Accessing and comparing objects according to hash value is an important way of thinking in computer program. It makes the accessing objects mainly depend on their own hash value, rather than comparing with other objects. The accessing efficiency is independent of the size of set, up to O(1).
However, HashMap has no order. If you want to maintain the order in which you add HashMap, you can use a subclass of HashMap called LinkedHashMap, which we’ll cover later. Map also has an important implementation class, TreeMap, that can sort, which will be covered in a later section.
This section mentions the Set interface, and in the next section, we’ll explore one of its important implementation classes, HashSet.
To be continued, check the latest articles, please pay attention to the wechat public account “Lao Ma said programming” (scan the qr code below), simple and simple, Lao Ma and you explore the nature of Java programming and computer technology. All rights reserved.