What is a hash collision?

A hash maps different inputs to unique, fixed-length values (also called “hashes”). It is one of the most common software operations.

If different inputs have the same hash value, a “collision” occurs.

For example, many web services use hash functions to generate a token that identifies a user’s identity and permissions.

AFGG2piXh0ht6dmXUxqv4nA1PU120r0yMAQhuc13i8
Copy the code

The string above is a hash. If two different users get the same token, a hash collision occurs. The server will treat the two users as the same person, which means that user B can read and change user A’s information, which poses A significant security risk.

One way hackers attack is by trying to create “hash collisions” that break into systems and steal information.

How can hash collisions be prevented?

The most effective way to prevent hash collisions is to expand the hash space.

For a hash of 16 bits, the probability of a collision is 1 in 65536. In other words, if there are 65,537 users, there are bound to be collisions. As the length of the hash increases to 32 bits, the probability of a collision drops to 1 in 4,294,967,296.

Longer hashes mean more storage, more computations, and will affect performance and cost. Developers must make a choice between security and cost.

Parse from String hashCode()

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}Copy the code

Is probably

Suppose n =3
i=0 -> h = 31 * 0 + val[0]
i=1 -> h = 31 * (31 * 0 + val[0]) + val[1]
i=2 -> h = 31 * (31 * (31 * 0 + val[0]) + val[1]) + val[2]
       h = 31*31*31*0 + 31*31*val[0] + 31*val[1] + val[2]
       h = 31^(n-1)*val[0] + 31^(n-2)*val[1] + val[2]Copy the code

Why prime

Prime numbers are chosen to optimally distribute the data in the hash bucket, and the result is uniquely likely to be multiplied by other numbers.

So why 31?

  1. 31 is a moderate prime number, one of the preferred primes to be used as a hashCode multiplier
  2. 31 can be optimized by the JVM,31 * i = (i << 5) - 1

So why not two, seven, 101?

Take String STR = “abcde” as an example

  • At 2 -> 2 to the fifth is 32
  • 101 -> 101^5 = 10,510,100,501

As we know from the above, the larger the hash, the less likely it is to cause a hash collision, but the more space it takes

Finally, let’s look at the result of the prime number 31:31^5 = 28629151, relative to 32 and 10,510,100,501. Isn’t it nice? Not too big

And effective Java has this phrase


The number 31 was chosen because it is an odd prime, and choosing an even number would cause an overflow in the multiplication operation, resulting in a loss of numerical information, since multiplying by two is equivalent to a shift operation. The advantages of choosing prime numbers are not particularly obvious, but it’s a tradition. Meanwhile, the number 31 has the nice property that multiplication can be replaced by shift and subtraction for better performance:
31 * i == (i << 5) - iModern Java virtual machines can do this optimization automatically.

Override hashCode ()

A simple, general-purpose hashCode algorithm is presented in Effective Java

Int result = 17; int result = 17; int result = 17; / / 31

B. Select all equals fields (only equals() fields for comparison), and then calculate the attributes of each field:

  1. If it isbooleanValue is computedf ? 1:0;
  2. If it isbyte\char\short\int, then the calculation(int)f;
  3. If it islongValue is computed(int)(f ^ (f >>> 32));
  4. If it isfloatValue is computedFloat.floatToIntBits(f);
  5. If it isdoubleValue is computedDouble.doubleToLongBits(f), then return long, then use rule (3) to process long, then return int;
  6. In the case of an object application, if the equals method calls the comparison recursively, then hashCode calls the comparison recursively. Otherwise, you need to compute a normal form for the field, for example, if the field is null, then the hashCode value is 0;
  7. If it’s an array, you need to treat each element as a separate field.java.util.Arrays.hashCodeThe hashCode () method contains eight basic types of arrays and reference arrays.

C. Finally, merge the hash code of each field into the hash code of the object.

public class Person {
    private String name;
    private int age;
    private boolean gender;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getAge() {
        return age;
    }

    public void setAge(int age) {
        this.age = age;
    }

    public boolean isGender() {
        return gender;
    }

    public void setGender(boolean gender) {
        this.gender = gender;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if(o == null || getClass() ! = o.getClass())return false;
        Person person = (Person) o;
        return age == person.age &&
                gender == person.gender &&
                Objects.equals(name, person.name);
    }

    @Override
    public int hashCode() {
        int hash = 17;
        hash = hash * 31 + getName().hashCode();
        hash = hash* 31 + isGender() ? 1-0.hash = hash * 31 + getAge();
        return hash;
    }Copy the code

Reference:

www.ruanyifeng.com/blog/2018/0…

Segmentfault.com/a/119000001…