String hashing is a technique for converting a string to a hash number.

Why do we need it? Sometimes for larger strings, you can compare the number of generated hashes without comparing each string.

Imagine that you need to compare two lines of the following 👇 string:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"
Copy the code

If they have 200 characters. If you compare them for equality, you can use a violent method to compare each character for equality.

Like this:

const A = "imagine that this is a string of 200 characters"

const B = "imagine that this is a string of 200 characterz"

function equal(A, B) {
  let i;
  for(i = 0; i < A.length; i++){
    if(A[i] ! == B[i]) {return false; }}return true;
}

console.log(equal(A,B));
Copy the code

This is not optimal for large string comparisons, and the performance is O(min(A, B)).

Of course, we can add conditions to compare the size of A and B, so that it becomes O(n). Something like this:

function equal(A, B) {
  if(A.lenght ! == B.length)return false;
  let i;
  for(i = 0; i < A.length; i++){
    if(A[i] ! == B[i]) {return false; }}return true;
}
Copy the code

As I said, the worst algorithm is order n. Imagine comparing large strings of different sizes.

String hashing is a technique that converts our string to an integer as a hash number.

Therefore, we will compare two hashes hash(A) == hash(B) with algorithm complexity O(1). This is the best way to compare two strings.

The formula for string hashing

pwithmAre prime numbers, s[0], S [1], s[2]… Is the value of each character. The value of the character in this case is the character encoding.

P: It is a prime number, roughly equal to the number of different characters used. For example, the hash of the word we want to calculate contains only lowercase letters of the English alphabet, and 31 is a good number. If you include capital letters, 53 is a good choice.

M: Generate two random numbers, the larger the number, the smaller the probability of the same occurrence. This variable should also be a prime number. 10 ^9 + 9 is a common choice.

Such as:

p = 31
m = 1e9 + 9
word = 'apple'
Copy the code

We want to know the hash of the word apple and plug the data into the formula, like this:

And then:

Character encoding values:

"a" = 97
"p" = 112
"p" = 112
"l" = 108
"e" = 101
Copy the code

Substitution formula: In the end:The hash value of the word apple is 96604250.

Using JavaScript:

function hash(word) {
    var p = 31;
    var m = 1e9 + 9;
    var hash_value = 0;
    for(var i = 0; i < word.length; i++) {
        var letter = word[i];
        var charCode = letter.charCodeAt();
        hash_value = hash_value + (charCode * Math.pow(p, i))
    }
    return hash_value % m;
}

console.log(hash("apple"));
Copy the code

Original text: jorgechavez. Dev / 2020/11/12 /…