preface

Introduction of the hash

As a back-end development, it’s fair to say that HashMap is the most frequently encountered data structure, and as the name suggests, HashMap relies primarily on the Hash algorithm to store and read data. Take the key code value K as the independent variable, calculate the corresponding function value through a certain function relation H (K)(called the hash function), interpret this value as the storage address of the node, and store the node in the storage unit. In retrieval, you compute the address in the same way, and then go to the corresponding cell to get the node you’re looking for. Nodes can be quickly retrieved by hashing. A storage structure based on hash storage is called a hash table. A position in the hash table is called a slot. The core of hash technology is the hash function. For any given dynamic lookup table DL, for each data element X in DL, if an “ideal” hash function H and the corresponding hash table HT are selected. The function value h (x. key) is where X is stored in the hash table HT. The data element X will be placed in this position when inserting (or building a table), and it will be looked up in this position when retrieving X. The storage location determined by the hash function is called the hash address. Therefore, the core of hash is to determine the corresponding relationship between the key code value (X.key) and hash address H (X. Key) by the hash function, and organize storage and retrieval through this relationship.

As there is no perfect hash hash function, slot conflicts will inevitably occur when the number of elements increases. Resolving element conflicts after hash is also a problem. Common hash methods include linear detection, secondary hash, and zipper. The following figure shows a commonly used chained address method to resolve hash conflicts. In slot 9, subsequent entries with keys 29 and 9 are linked to that slot by generating a linked list.

The body of the

In addition to optimizing the hash hash algorithm, it can be seen that the hash function will adopt different conflict resolution solutions when the number of elements is too large, which obviously brings challenges to the complexity of the search and insert time of hashMap (O1). It is also necessary to expand hash tables when the load factor reaches a certain threshold, such as slot 10 and element 7. In this article, we will discuss how to expand hashmaps in different fields (language and software).

Expansion mechanism

The hashmap capacity expansion mechanism to be explored in this paper is derived from the different implementation of hashMap capacity expansion in the current mainstream programming language Java, Golang and the memory database Redis.

HashMap expansion in Java

Let’s take a look at the Hashmap expansion in Java. As shown in the figure below, when the load factor of the entire hash table reaches the threshold due to the put operation, a blocking expansion process will be entered. First copy an entry array of two cups and iterate the elements of the original entry to perform rehash. Jdk1.8 uses a power of 2 extension, so elements are either in their original position or moved by a power of 2. Therefore, when rehash is performed, we just need to check whether the new bit of the original hash value is 1 or 0. 0 is the same index, and 1 is “old index +oldCap”.Because the entire expansion operation issynchronousTo expand a hashmap with a large capacity is a time-consuming process that blocks other operations of the current thread. Therefore, when we use large-capacity Hashmaps in Java, we often need to estimate the initial capacity of the hashmap and specify an initial value of 2 to the power of N for optimization.

Golang HashMap expansion

The implementation principle of hashmap is similar. Hash entry array + linked address method is used to solve the conflict single linked list. Therefore, the operation of hashmap expansion in Golang is basically the same as that in Java, which is to create a hash table with twice the previous capacity and then rehash the elements. In Golang, however, the idea of incremental expansion is adopted to shorten the response time of the Map container. Capacity expansion creates a new table that is twice the size of the original table. Moving the oldbucket into the new table does not remove the oldbucket from the oldbucket, but marks it as deleted. Because this work is gradually completed, part of the data will be in the old table and part of the data will be in the new table, thus affecting the hash table insert, remove, lookup operation processing logic. Oldbuckets are released only after all buckets have been moved from the old table to the new table. That is, the old table and the new table will be searched during the GET operation, but the slot (bucket) in the hash table will be expanded and migrated during the INSERT and remove operation. If the current expansion operation is not completed and a new expansion request is received, A second migration is carried out.

Redis HashMap expansion

The golang incremental expansion idea mentioned above is more directly implemented in Redis. In redis, the dictionary data structure is stored using Hashtable, and the entire Redis string key is stored using HashTable. For incremental expansion of hashtable, there isProgressive hashCall it. When redis is initialized, two hashtables, h0 and h1, are created, as shown in the figure below (from redis design and implementation), where h0 serves the default hashtable and h1 is empty by default.

Then, when the capacity of the hashtable reaches a threshold, it needs to be expanded gradually. In the following two figures, the elements of the old hashtable are rehash to the new hashtable when the modification operation is performed on the current slot. After all elements are migrated, the pointer of h0 points to h1. H1 is reset to NULL. The operation is complete.

Redis uses both h0 and H1 hash tables during progressive hash expansion. For lookup operations, h0 is searched first. If none is found, the search continues in the h1 table. New operations are saved in h1, ensuring that the number in H0 does not continue to increase.

conclusion

This article describes how to implement hashMap capacity expansion in different languages or software. Java seems to be relatively “rigid” synchronous capacity expansion. If used incorrectly, it will cause the current worker thread to block, but it can be prevented by the way of hashMap capacity initialization. However, the hash table in Golang and Redis is basically implemented around the idea of incremental and progressive expansion. Compared with The Java implementation, such an advantage is that the short stop expansion caused by PUT greatly reduces the response time of PUT operation, but also adds complexity to the implementation of other GET and DELETE operations. Each has its own strengths. When I knew only Java, I thought expansion was all blocking, and learning about other container implementations has expanded my horizons.

References:

Redis Design and Implementation, Section 4.5, Progressive Hash Golang Map expansion