kmp

The KMP algorithm is an improved string matching algorithm proposed by D.E.K Nuth, J.H.Morris and V.R.Pratt, so it is called the Knut-Morris-Pratt operation (abbreviated KMP algorithm). The core of KMP algorithm is to use the information after the failure of matching to reduce the matching times of pattern string and main string to achieve the purpose of fast matching. This is done through a next() function that itself contains local matching information for the pattern string. The time complexity of KMP algorithm O(m+n).

Watch porn algorithm

Before writing this article, I have also seen many articles about KMP written by people, but I always feel that the explanation of the next array is always ambiguous and not easy to understand, so here, novice programmers want to try to write an understanding of the KMP algorithm.

Seem to have pain points

If I have a source string S “abcdhabcdfj” and a substring T “abcdf” and I want to find the position of T in S, how do I write the algorithm to find it.

The simplest kind of violence backtracking, bit-by-bit matching

S - > | h | a b c d a b c d f j 1 T - > a b c d | | f - > 4 f match error, then T clusters need to move to the right to continue to match 2 T - > a b c d f - > 0 a matching error, T string to the right, T -> a b C d f jCopy the code

It is obvious that the matching efficiency is very low, so the above (KMP is presented to promote efficiency, we can see from the above matching process, S string matching process is actually don’t need to do anything that moves, we from beginning to end is conducted on the T series mobile, KMP is on T series of mobile optimization.

Take the veil off KMP

From the perspective of God, we can observe that abCD is matched, and f fails to match. We can think for a moment that if we operate manually, we will definitely not match the first four bits one by one, but will directly start matching from F bits, so that we can basically find out the core of KMP. How do we know the correct traceback position, which is the next array, also known as the matching table, we need to find next in advance, and then use the Next array to get the correct traceback position when matching S and T strings.

How do I figure out the next array

First we need to introduce the concepts of prefixes and suffixes. Let’s look at a table:

T    a  b  a  b  c  d  a  c 
i 0  1  2  3  4  5  6  7  8
k -1 0  0  1  2  0  0  1  0
Copy the code

In the table above, I represents the subscript and k represents the length of the prefix and suffix match. As for why K starts at -1, which is what I’m most confused about understanding the next array, k starts at -1 to indicate a nonexistent value and the string must be moved to the right.

Let’s talk about prefixes and suffixes:

  • A has only one character, so set the matching length to 0
  • The matching length of ab prefix [a] and suffix [b] is 0
  • Aba prefix [a, ab], suffix [ba, a] A matches so length 1
  • Abab prefix [a, ab, ABA] and suffix [bab, ab, b] ab match, so the length is 2
  • Ababc prefix [A, ab, ABA, abab] and suffix [babc, ABC, BC, c] are not matched, so the length is 0
  • Ababcd The prefix [A, ab, ABA, abab, ababc] and the suffix [babcd, abcd, BCD, CD, d] are not matched, so the length is 0

And so on and so forth, and there’s a little bit of a misunderstanding here, where the prefixes and suffixes are not palindromic strings, not like abba, but like abab.

Next we implement the next fetch:

func getNext(s string) []int {
  l := len(s) + 1 // The 0 bit is saved to -1, so one more bit is needed to determine whether -1 is performed to move s to the right
  next := make([]int, l, l)
  next[0] = - 1
  var (
  	k = - 1 // The initial value k is the length of the prefix and suffix matching
  	i int
  )
// Why l-1, because the initial value is set to -1
  for i < l - 1 {
  	if k == - 1 || s[k] == s[i] {
  		k++
  		i++
  		next[i] = k
  	} else {
  		k = next[k] // The location of the traceback}}return next
}
Copy the code

When we implement the next array, we actually use the principle of backtracking to get the correct backtracking position, and then we can use next to implement KMP:

func kmp(source, target string) int {
  var (
  	k = - 1
  	i int
  )
  next := getNext(target)
  for k < len(target) && i < len(source) {
  	if k == - 1 || source[i] == target[k] {
  		k++
  		i++
  	} else {
  		k = next[k]
  	}
  	if k == len(target) {
  		return i - k
  	}
  }
  return 0
}
Copy the code

And what we see here is that we actually use the principle of double backtracking to get to the right position.

Novice on the road for the first time, there are wrong places please gently spray, welcome everyone to correct my mistakes.