————— the next day —————
\
What does that mean? Let’s take an example:
In the figure above, the string B is A substring of A, and the first occurrence of B in A is subscript 2 (the first subscript of the string is 0), so 2 is returned.
Let’s look at another example: \
In the figure above, the string B does not exist in A, so -1 is returned.
To unify the concept, we refer to string A as the main string and string B as the pattern string later in this article. \
Xiao Hui’s idea is simple and crude. Let’s use the following example to demonstrate:
In the first round, we start with the first digit of the main string and compare the characters of the main and pattern strings one by one:
Obviously, the first character of the main string is A, and the first character of the pattern string is B, and the two do not match.
In the second round, we move the pattern string back one bit, starting with the second character of the main string and comparing the characters of the main and pattern string one by one: \
The second character of the main string is b, and the second character of the pattern string is also B.
The third character of the main string is B, and the third character of the pattern string is ALSO C, which do not match.
In the third round, we move the pattern string back one more bit, starting with the third bit of the main string, and compare the characters of the main and pattern strings one by one:
The third character of the main string is b, and the third character of the pattern string is also B.
The fourth character of the main string is C, and the fourth character of the pattern string is also C.
The fifth character of the main string is e, and the fifth character of the pattern string is also E.
As a result, the pattern string BCE is a substring of the main string abbcefgh, and the subscript at the position where the main string first appears is 2:
So that’s gray’s solution, which has a name, BF, for Brute Force.
In this case, the first three characters of the pattern string a match the characters of the main string in each round of character matching, until the last character of the pattern string B, the mismatch is found:
As a result, the two strings need to be compared four times in each round, which is obviously wasteful.
Assuming that the length of the main string is M and the length of the pattern string is N, then in this extreme case, the worst time complexity of BF algorithm is O (Mn).
— — — — — —
What does it mean to compare hashes?
Those of you who have used hash tables know that every string can be converted to an integer, using some hash algorithm, and that integer is hashcode:
Hashcode = hash (string) \
Obviously, comparing hashCode with just two strings is much easier than comparing two strings character by character. \
Given the main and pattern strings as follows (assuming the string contains only 26 lowercase letters) :
As a first step, we need to generate hashCode for the pattern string. \
There are various algorithms that generate HashCode, such as:
According to a combined
This is the simplest way, we can think of a as 1, B as 2, and C as 3…… Then add all the characters of the string, and the result is its Hashcode.
bce = 2 + 3 + 5 = 10\
However, although this algorithm is simple, it is likely to cause hash conflicts, such as the same HASHcode for BCE, BEC, and CBE. \
Convert to base 26
Since the string contains only 26 lower-case letters, we can evaluate each string as a 26 base number.
bce = 2*(26^2) + 3*26 + 5 = 1435\
The advantage of this method is that it greatly reduces hash conflicts, but the disadvantage is that it is a large amount of computation and may be beyond the range of integers, and the result needs to be modulo.
For demonstration purposes, we will use the bitwise hash algorithm, so the BCE hashcode is 10:
Second, generate the hashcode for the first equal primary string in the main string.
Since the main string is usually longer than the pattern string, it doesn’t make sense to convert the entire main string to HashCode. It only makes sense to compare substrings of the same length as the pattern string.
So we first generate hashCode, the first substring of the main string that is the same length as the pattern string,
Abb = 1 + 2 + 2 = 5:
Third, compare the two Hashcodes. \
Obviously, 5! =10, indicating that the pattern string and the first substring do not match, we continue the next round of comparison.
** Step 4, ** generates the hashCode of the second equal primary string in the main string. * * * * \
BBC = 2 + 2 + 3 = 7:
Fifth, compare the two HashCodes. \
Obviously, 7! =10, indicating that the pattern string and the second substring do not match, we continue to the next round of comparison.
** Step 6, ** generates the hashCode of the third equal primary string in the main string. * * * * \
Bce = 2 + 3 + 5 = 10:
Seventh, compare the two Hashcodes. \
Obviously, 10 ==10, both hash values are equal! Does that mean that the two strings are equal?
Don’t get too excited, we need further validation due to the possibility of hash collisions. \
Step 8, compare two strings character by character.
The hashCode comparison is just a preliminary verification, and then we need to compare the two strings character by character, like the BF algorithm, and finally determine the two strings match. \
Finally, it is concluded that the pattern string BCE is a substring of the main string ABbcefgh, and the first occurrence of the subscript is 2.
What does that mean? Let’s look at another example:
In the figure above, I know that the hashcode of the substring abbcefg is 26. How do I calculate the hashcode of the next substring, bbcefgd? \
Instead of resuming the substring’s characters, we can take a simpler approach. The new substring is preceded by an A and followed by a D, so: \
New HashCode = old HashCode -1+4 = 26-1+4 = 29
The calculation of the next substring bcefgde is the same: \
New HashCode = old HashCode -2+5 = 29-2+5 = 32
public static int rabinKarp(String str, String pattern){
// The main string length
int m = str.length();
// The length of the pattern string
int n = pattern.length();
// Computes the hash value of the pattern string
int patternCode = hash(pattern);
// Computes the hash value of the first substring of the main string as long as the pattern string
int strCode = hash(str.substring(0, n));
// Compares the hash value of the pattern string with the local hash value of the main string.
// If a match is made, an exact comparison is made; If not, compute the hash value of the adjacent substrings in the main string.
for (int i=0; i<m-n+1; i++) {
if(strCode == patternCode && compareString(i, str, pattern)){
return i;
}
If it is not the last round, update the hash value of the main string from I to I +n
if(i<m-n){ strCode = nextHash(str, strCode, i, n); }}return - 1;
}
private static int hash(String str){
int hashcode = 0;
// The simplest hashcode calculation is used here:
// take a as 1, b as 2, c as 3..... And then we add them in bits
for (int i = 0; i < str.length(); i++) {
hashcode += str.charAt(i)-'a';
}
return hashcode;
}
private static int nextHash(String str, int hash, int index, int n){
hash -= str.charAt(index)-'a';
hash += str.charAt(index+n)-'a';
return hash;
}
private static boolean compareString(int i, String str, String pattern) {
String strSub = str.substring(i, i+pattern.length());
return strSub.equals(pattern);
}
public static void main(String[] args) {
String str = "aacdesadsdfer";
String pattern = "adsd";
System.out.println("Position of first appearance :" + rabinKarp(str, pattern));
}
Copy the code
— — the END — — –
Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.
Highlights from the past2019Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part I) note: To join our wechat group or QQ group, please reply "add group" to join knowledge planet (4500+ user ID:92416895), please reply to knowledge PlanetCopy the code
Like articles, click Looking at the