“This is my 33rd day of participating in the First Challenge 2022. For details: First Challenge 2022.”


Hello, everyone. I’m your handsome egg.

Today to learn can let children stop crying KMP, is not very panic?

Originally in the plan of this egg, THE order of KMP will be put later, but I can’t resist the bitch to ask.

What are you talking about? Just arrange it! For today, go straight to string pattern matching!

Your needs are the direction of my liver! I can’t get a message from my little brother up there. It doesn’t look like iron egg powder.

This guarantee arrangement is clear, the article in addition to understand the essence of KMP algorithm, take to deal with the examination questions also guarantee no problem!


String pattern matching is one of the most important operations in a string, which is to find the position of the pattern string in the main string.

Smelly treasure: Uh, what’s a main string? What’s a mode string?

Balls: If you are looking for the position of string B in string A, then string A is the main string and string B is the pattern string. It is understood that a main string is a large string and a pattern string is a small string.

There are many string pattern matching algorithms, but when it comes to string matching, the first thing that comes to mind is watching porn, uh, THE KMP algorithm.

KMP algorithm matching string is very efficient, but the principle is a bit complex, learning brain buzzing, is the student learning data structure and algorithm can not go around the difficulty.

But don’t panic. It’s a cool thing to know. Take out your toilet paper, uh, scratch paper, and work on it.


In order to learn the KMP pattern matching algorithm, first understand how stupid the previous matching, so let’s look at brainless violent matching.

Violence pattern matching

Violent pattern matching, the poster child for brainless flow, the uncrowned king of the helpless.

Assuming the length of the main string is N and the length of the pattern string is M, violent pattern matching is the subscript of the main string 0, 1, 2, 3… The n-m element is the beginning of the pattern string and is matched with the pattern string in turn to see if there is a complete match.

Let’s look at an example.

As can be seen from the figure above, violence matching is to do pattern string matching under the length of the main string.

In the best case, the match is successful from the first time, and the time complexity is O(1).

The worst case scenario is that the pattern string matches to the last one every time before it is different from the main string. For example, the main string is “aaaaab” and the mode string is “aab”.

If you look at the graph above, except for the last one, every time you get to the end of the match, ah, we’re different.

In this case, in the figure above, the pattern string is the first 3 times, and each time it matches 3 times, and does not match until the fourth time, all matches, no further movement is required, so the number of matches is (6-3 + 1) * 3 = 12.

Thus, for the main string length of N and the mode string length of M, the worst-case time complexity is O((n-m + 1) * m) = O(n * m).

You see, violent pattern matching is so inefficient, you can live with it?

I can’t stand it. Come on, watch porn!

KMP pattern matching

The full name of KMP algorithm is Knuth Morris Pratt, which was developed by D.E.Knuth, J.H.Morris and V.R.Pratt together.

The purpose of KMP algorithm is to quickly find the pattern string from the main string, the emphasis is fast, that how fast?

That must be removing the “no-brain” part of the violence pattern match.

The figure above shows the first 2 steps of violence pattern matching, with comparison Pointers I and J.

As you can see, on the first match, the comparison Pointers I and J go to subscript 5 to find the mismatched element, and then on the second match, I returns to subscript 1.

If you continue to match, you’ll see that the comparison pointer I is constantly going back and forth, as in the figure above, I went to 5 on the first match and back to 1 on the second match.

This action is called backtracking, and the main bad guy that makes violent pattern matching inefficient is frequent comparison pointer backtracking.

What THE KMP algorithm does is compare Pointers without going backwards, just moving the pattern string backwards.

So let’s see how this works. I’m going to take this part a little bit slower to get a feel for it.

Again, take the primary string S = abcabcabda and the pattern string T = abcabd.

The figure above shows the first match between the pattern string and the main string. The elements that do not match at I = 5 and j = 5 are all matched before that.

If you look at the pattern string, you will see that in the matching parts, there are parts with the same prefix and suffix.

The matching of the second step can be done as shown below, which is the core of the KMP algorithm.

Does that make sense? The prefix slides right into the place of the suffix.

Why is that possible?

Because it was compared the first time, the first and second elements of the pattern string are exactly the same as the third and fourth elements in the part that exactly matches the main string.

At this point, the comparison pointer I of the main string continues to be compared from I = 5, and the comparison pointer J of the mode string does not need to be directly compared from 0, but can be compared from the position j = 2.

You see, a lot of meaningless matching and backtracking has been removed, which has greatly improved efficiency.

So the problem now is to find the longest common prefix and suffix of the pattern string in the previously matched part every time there is a mismatch.

That is, when I encounter a mismatch, I slide the pattern string from the longest prefix to the longest suffix (in fact, the comparison pointer J moves from the longest prefix to the longest suffix). And the array that holds that location is the next array.

Next numerical

Solving the next array has always been a big problem, and it’s not just when you’re writing code, it’s a must-do for any college or graduate school exam that involves data structures and algorithms.

Originally wanted to write I have been commonly used before the next method, then accidentally saw the b station up main lunar lantern method, I feel like it is easier to understand some, and then learn to share with you.

Take the pattern string T = ababc here.

Schools with books, speak KMP, most of the array subscript 1, not 0, so in order to more convenient, the interpretation of the interpretation of this example I’ll default array starts from the subscript 1 (if you’re used to start from 0, is the outcome of an array starting at 1, each value – 1 as below).

(1) Write the prefixes of the pattern string T

(2) For all prefixes of pattern string T, find the longest common prefix for each prefix. And the longest common prefix or suffix should be shorter than the original character (it doesn’t make sense if it’s the same length, since we’re sliding).

Here we use the string “abab” as an example, when the comparison pointer points to the last digit, there is a mismatch:

Therefore, for the first three “ABA” characters, first find the common prefix with length 2, the longest prefix is “ab”, the longest suffix is “ba”, obviously prefix ≠ suffix; The longest prefix is “a”, the longest suffix is also “a”, the longest prefix = the longest suffix, so “abab” the longest common prefix = 1.

Normally, the pattern string slides from the longest common prefix a to the longest common suffix A, and the comparison pointer J is recompared from 2.

When the pointer points to the last digit, there is a mismatch:

For the first four “abab” characters, first find the common prefix with length 3, the longest prefix is “ABA”, the longest suffix is “bab”, the two are not equal; The longest prefix is “ab” and the longest suffix is “ab”, so the length of the longest common prefix in the string “ababc” is equal to 2.

So, the pattern string slides from the longest prefix ab to the longest suffix ab, and the comparison pointer J is recompared starting at subscript 3.

I don’t know if you can see any patterns in these two graphs:

Compare the position of the pointer j = the length of the longest common suffix + 1.

So the next array of the final pattern string T = ababc looks like this:

And the reason why the first one is 0, you can think of it as when the first one doesn’t match, it doesn’t have anything in front of it, it’s called -1, so it’s going to be -1 + 1 = 0.

So the next value of the entire pattern string T is complete.

If you’re used to starting at 0, this is the result of an array starting at 1 for each value -1:

# Code starts with subscript 0. def getNext(T): Next = [-1] * len(T) # next[0] = -1, If j == -1 or T[I] == T[j]: if j == -1 or T[I] == T[j]: if j == -1 or T[I] == T[j]: Next [I] = 1 next[I] = 1 next[I] = 1 else: j= next[j] return nextCopy the code

Matching operation implementation

Now that we have the next array, let’s split the implementation of KMP pattern string matching.

Here the main string uses S = abcabcabcabda, and the pattern string uses T = abcabd as an example (the default array starts at subscript 1).

As you can see from the figure above, the first mismatch occurs when I = 4, j = 4, and next is 2.

This means that, for the second time, the match starts with the element at subscript 2 of the pattern string and the current position of the main string, as shown below.

The figure above shows that there is still a mismatch, and at this point the value of next is 1, which means that for the third time, the element at the subscript 1 of the pattern string is matched with the element at the current position of the main string, in the form shown below.

And then the third time.

Look at the figure above, the third mismatch occurs when I = 6, j = 3, and the next value is 1, which means that the fourth mismatch is from the position of the pattern string j = 1 to the current position of the main string, in the form shown below.

At this point, the pattern string still does not match the element on the main string. At this point, next is 0. When the value of next is 0, it is equivalent to comparing the current element of the pattern string with the next element on the main string, as shown below.

And then the fifth time, it found a match.

Def KMP(S, T): I = 0 j = 0 next = getNext(T) while I < len(S) and j < len(T): T[0] = T[0] = T[0] = T[0] = T[0] = T[0] If j == -1 or S[I] == T[j]: I += 1 j += 1 # next: j = next[j] # if j == -1 or S[I] == T[j]: I += 1 j += 1 # return i - j else: return -1Copy the code

KMP’s explanation to this is all over, really not easy ah, write for a long time for a long time.

I think KMP all speak clearly, read to protect you no matter learn, or exam, interview have passed.

You see, KMP is not that hard to learn!

A quick reminder: I illustrated next and KMP with arrays starting at 1, so I encourage you to manually simulate arrays starting at 0 on scratch paper.

After all, the code I gave you, the array starts at index zero, and I just want to see if you really get it.

I’m the first person with good intentions.

The code about KMP is not very long, we suggest that according to the principle of writing a set of their own KMP board, when the time comes to use directly out of the set.

Still that sentence, can see this is true love, give me a thumbs-up to brush up ~

I’m balls. I’ll see you next time!