Hashcode is an eigenvalue generated by the hash algorithm.

A hash algorithm is essentially an idea: compute the eigenvalues of input and output parameters using a function. As an eigenvalue, the purpose of Hashcode is to identify the input parameter. It has a binding relationship with the input parameter, but it does not mean that hashCode is the same as the input parameter, depending on the implementation of the hash algorithm.

Let’s say we want to implement a hash function that uniquely corresponds to hashcode and its input parameter. Since the input parameter is an infinite range of input values, if HashCode is to be unique to different input parameters, the range of HashCode is also an infinite range of input values, which requires a very large space to store, so we have to carefully consider whether the scenario is suitable.

A general hash algorithm can produce the same Hashcode for different input parameters. Just as you can put 10 balloons into 9 chests, there must be two balloons in one of the chests. Similarly, for an infinite set of input parameters with a finite amount of hashcode storage space, There must also be cases where the entry parameter is different but the Hashcode is the same. This is called a hash collision. No matter how powerful a hash algorithm is, as long as you have an infinite input but a non-infinite output set S, you’re bound to have hash collisions. What’s the problem with hash collisions? The query efficiency is reduced.

For example, in Java HashMap, when hash conflicts occur, the chain address method will be used to deal with conflicts. As more and more conflicts occur, the chain will become longer and longer, and the query efficiency may degenerate to O(n). One of the advantages and disadvantages of a hash algorithm is that for many input parameters, Whether the output is evenly distributed on S.

Hashcode can be evaluated in many different ways, such as addition, multiplication, division, bitwise operations, table lookup, and mixing. Generally, different hash algorithms are used for different scenarios. There are some commonly used hash algorithms, such as MD5 and SHA. Java’s HashMap uses HashCode, and one of its put operations simplifies the process:

  1. Calculate hashcode from key
  2. Use hashcode to find the location of the bucket
  3. If there is a conflict, the conflict is resolved, and if there is no conflict, the assignment is completed.

In fact, the process of using Hashcode in other scenarios is similar, because hashcode generated by hash algorithm generally falls in a very wide range. For example, Java will generate an int value ranging from -2147483648 to +-2147483648. For example, the initial range of hashMap is only 16, so we have to mod hashCode according to the actual range of use.

What are the characteristics of the hash algorithm that generates Hashcode?

  1. Input arbitrary length, output fixed length:

The hash function doesn’t have to know what the input means or how long it is, as long as the input hash function produces a fixed-length bit value. The famous SHA256 hash function, for example, produces 256-bit zeros and ones for any value entered. Type in a copy of Romance of The Three Kingdoms or just a letter A, and you get 256 bits of data.

  1. Certainty:

When the hash function is passed the same input value, the return value is the same.

  1. Collision resistance:

Collisions are divided into strong and weak collisions.

  • Strong collision: Find two messages with the same Hashcode for any hashcode
  • Weak collision: Specify hashcode = A and find A message with hashcode = A

So collision resistance also scores in two ways.

  • Strong collision resistance: It is very difficult to find two messages with the same hashcode for any given hashcode.
  • Weak collision resistance: Given hashcode = A, it is very difficult to find A message with hashcode = A.

Hash algorithms with full collision resistance can be used for encryption algorithms, such as SM3 and SHA. However, MD5 and SHA-1 algorithms have been proved not to have strong collision resistance. Therefore, it is better not to use MD5 and SHA-1 algorithms when using encrypted hash algorithms.

  1. Concealment and unidirectional:

The calculation of hash function is unidirectional and irreversible. X derives H (x), but there is no way to reverse it (unidirectional), that is, the hash gives no information about the input x. In other words, the information about X is hidden, which is called concealment.

Due to the hiding and unidirectional nature of the hash function, it can well hide the input parameters. Bitcoin mining also takes advantage of such characteristics. The process of mining in Bitcoin is actually to find an input parameter (H (block header + nonce) ≤target) that meets the conditions. Due to the characteristics of the hash function, miners can only brute force crack and traverse all possibilities, which becomes a competition of computing power.

  1. Unpredictable:

Hashcode cannot simply be extrapolated from the input parameter.