This article is only a summary of learning, there may be mistakes, welcome to point out. Arbitrarily reprinted.
Given a string str1 and a string str2, find the first position in string str1 where the string str2 occurs (starting from 0). If none exists, -1 is returned.
str1 = aaaaabcabc
str2 = abcabcaa
Some time ago, I accidentally came into contact with the algorithm explanation video of Zuoshen. In about three days, I watched the KMP algorithm repeatedly for three times. Finally have some of their own understanding and experience. Using the traditional KMP algorithm to do string matching is actually an optimization of the violence algorithm with the next array. Another way to understand the KMP algorithm is to understand it as dynamic programming, which is not described in detail here.
Here I divide it into three parts.
- Violent solution
- KMP algorithm
- How do I find the next array
Violent solution
Violence algorithm looks very simple, the actual coding still needs to deal with some details, I suggest to write. This gives str1 an I pointer and str2 a j pointer. The first initial position of I is 0, and the last initial position is str1.length-1.
-
Str1 [I] and str2[j] are equal: both I and j move back one bit.
-
Str1 [I] and STR2 [j] are unequal, where j returns to 0 and I is compared from the next initial position.
If j can reach length, it indicates that bit 0 is equal to bit str2. Length-1. In this case, return i-j, which is the index of the first position of str2 in str1.
If I reaches the last initial position, str1.length-1, and there is no match, then str2 will never be matched. At this point it returns -1.
Code:
public int strStr(String str1, String str2) {
int length1 = str1.length();
int length2 = str2.length();
if(length2 == 0) return 0;
if(length1 < length2) return- 1; int i = 0;while(i < length1){
int j = 0;
while(i < length1 && j < length2
&& str1.charAt(i) == str2.charAt(j)){
i++;
j++;
}
if(j == length2){
return i-j;
}
i = i - j + 1;
}
return- 1; }Copy the code
I suggest you do it.
KMP algorithm
Leave aside how next came to be. You need to know that it stores some information about str2. Its value is equal to the maximum value of the prefix equal to the suffix of the string formed by all characters before str2. It’s very convoluted here. Here’s an example:
When index is 6, the string is a, b, C, a, B, and c.
When the prefix and suffix are 1, the prefix is A and the suffix is C, respectively. Next can’t take 1.
When the prefix and suffix are 2, the prefix is ab and the suffix is BC. Next cannot be 2.
When the prefix and suffix are 3, the prefix is ABC and the suffix is ABC, which are the same, and next can be taken.
When the prefix and suffix are 4, the prefix is abca and the suffix is cabc, and the suffix cannot be 4 for next.
When the prefix and suffix are 5, the prefix is abcab and the suffix is bcabc, and the suffix cannot be 5 for next.
The prefix and suffix cannot be 6. Because the prefix and suffix cannot be the string itself.
Index: 0 1 2 3 4 5 6 7 8 9
str1 = a a a a a b c a b c
str2 = a b c a b c a a
Next: -1 0 0 0 1 2 3 1
Next is the flow of KMP algorithm. In the violent solution, we still have two Pointers I and j.
- When two elements are equal: I and j move back one bit.
- When two elements are unequal: j = next[j], if next[j] = -1, it indicates that the j pointer has moved to the front.
So let’s take a closer look at these two unequal cases, and that’s the hard part.
next[j] ! = -1, in which case the j pointer jumps directly to str2[next[j]]. Why is this ok? For example.
index 0 1 2 3 4 5 6 7
str1 = a b c f a b c x
str2 = a b c f a b c y
next=-1 0 0 0 0 1 2 3
At index 6, I = j = 7, the two elements are not equal, so we jump j to str2[next[j]], which is j = 3. At this point, the substring before STR1 and the substring before str2 are equal, and they have the same next array. J jumps to 3, which means that the first three digits of the y/x substring are equal to the last three digits. So, the first three digits of our y substring and the last three digits of our x substring don’t need to be compared at this point, because this 3 assumes that they’re equal. The first three bits (index 0, 1, 2) are not compared, and the fourth bit (index 3) is directly compared. This is the core of the Next array. It’s a little bit more intuitive in the left God video.
str1 = a b c f a b c x
str2 = * * * * a b c f a b c y
Compare whether x and f are equal.
Next [j] == -1, in this case, j is already in the front, and there is no way to move forward, so I has to move back.
Code:
public static int getIndexOf(char str1[], char str2[]) {
if(str1.length == 0 || str1.length < str2.length) {
return- 1; }if(str2.length == 0) {
return0; } int i = 0; int j = 0; int next[] = getNextArray(str2); // There are three caseswhile( i < str1.length && j < str2.length) {
if(str1[i] == str2[j]) { i++; // Two elements are equal. }else if(next[j] == -1) {
i++; //next[j] == -1
}else{ j = next[j]; //next[j] ! = 1}}return (j == str2.length) ? i-j : -1;
}
Copy the code
Next array
str2 = a b c f a b c y
next=-1 0 * * * * * *
The first digit defaults to -1. Because the first element doesn’t have a substring.
The second value defaults to 0. Since the substring of the second element has only one element, the maximum number of prefixes and suffixes can only be 0.
And then we have the third digit, and the substring of the third digit is a and b, and that’s the hard part. How to figure out its next value. j = 3
Compare the j-1 next value, cn = next[J-1] str2 with str2[J-1]. So cn is equal to 0, so you’re comparing the 0 element with the 1 element. There must be two cases, equal and unequal. And when it’s not equal, there are two cases.
index 0 1 2 3 4 5 6 7
str2 = a b c f a b c y
next=-1 0 0 0 0 1 * *
And just to visualize it, let me do a different example. j = 6.
cn = next[j-1] = 1, str2[cn] = b
str2[j-1] = b
This time is equal, so next[6] = ++cn = 2. Why is that?
What does this CN stand for? Cn represents the next value of the J-1 bit, which represents the maximum prefix and suffix of the J-1 bit. The maximum value is 1, which means his first and last digit are equal. Then compare its second digit (STR2 [cn]) to the next digit (str2[J-1]). Equal, next[6] = ++cn = 2. What if it’s not equal? There are two cases.
cn > 0,cn = next[cn]
cn<= 0,next[j] = 0
Why is it that we continue to divide in the case of substrings to find cn equal to STR [j-1], if we never find it? Next [j] = 0.
Code:
public static int[] getNextArray(char []str) {
if(str.length == 1) {
return new int [] {-1};
}
int next[] = new int [str.length];
next[0] = -1;
next[1] = 0;
int i = 2;
int cn = 0;
while( i < str.length) {
if(str[i-1] == str[cn]) {
next[i++] = ++cn;
}else if(cn > 0) {
cn = next[cn];
}else{ next[i++] = 0; }}return next;
}
Copy the code
The subtotal
- Violent solution, write more, write more than two times to become proficient.
- KMP concrete implementation, there are three cases. Elements equal, elements unequal and next not equal to -1, elements unequal and next equal to -1.
- There are also three ways to solve next. The corresponding elements of CN and J-1 are equal, the corresponding elements are unequal and CN >0, and the corresponding elements are unequal and CN <=0.
Public id: STUL