Data structure hash table

What is a hash table

A Hash table, also known as a Hash table, is a data structure that is directly accessed based on a key-value. It is based on array, by one of the keywords is mapped to array in the table below to speed the search speed, but also, and array data structure, linked lists, trees, and so on in the data structure of a keyword search, usually to traverse the entire data structure, which is O (N) time, but for a hash table, only O (1) time. Note that an important problem here is how to convert the keyword to the index of the array. This conversion function is called the hash function (also known as the hash function). The conversion process is called hashing. Hash tables are used to quickly determine whether an element is present in a collection. The image hash function hashes a large range of numbers into a small range of numbers corresponding to the index of the array. When a hash function is used to insert data into an array, the array is a hash table. Hash collision chain address method — (zipper method) The basic idea of this method is to form all elements with hash address I into a single linked list called a synonym chain, and the head pointer of the single linked list is stored in the ith element of the hash table, so the search, insertion and deletion are mainly carried out in the synonym chain. The chained address method is suitable for frequent inserts and deletions. Advantages of linear detection method: easy to implement; Always find a spot (if there is one); When the table is not very full, the average performance is very good. Disadvantages: The “cluster” or “cluster” keys can be formed in the adjacent slots of the table. When these clusters fill a large portion of the entire array, performance deteriorates significantly, because the probe sequence performs what is essentially an exhaustive search of the majority of the array. The advantage of a hash table is that it can be accessed randomly by subscript, just like an array. Fixed the problem that arrays could only be accessed through integer indexes. Disadvantages of hash tables The hash function is used to calculate the corresponding hash value before accessing data, which consumes more memory. The load factor is one measure of hash table performance. Since a hash table is actually an array, there is a limit to how much data it can carry, and hash collisions are bound to occur when the hash table stores more data than the length of the array index.

Three common hash structures When we want to use hash to solve a problem, we usually choose the following three data structures. In C++, set and map respectively provide the following three data structures. Their underlying implementation and advantages and disadvantages are listed in the following table: Image red-black tree is a balanced binary search tree, so the key value is ordered, but the key cannot be modified. Changing the key value will lead to the confusion of the whole tree, so it can only be deleted and added.

The picture

STD :: Map and STD :: Multimap keys are also ordered (this question is also often used as an interview question to test understanding of the underlying language container). When using sets to solve hash problems, use unordered_set first, because it is the most efficient to query and add and delete. If you want the set to be ordered, use set. If you want the set to be ordered, use multiset. So let’s look at a map, where a map is a key value data structure. In a map, there are restrictions on keys, but there are no restrictions on values, because keys are stored using a red-black tree. So to summarize, we need to think about hashing when we’re trying to quickly determine if an element is in a set. But hashing also sacrifices space for time, because we need to use extra arrays, sets, or maps to store data in order to achieve fast lookup.

This article is published by OpenWrite!