This paper is to introduce what is BF algorithm, KMP algorithm, BM algorithm one of the three.
There are too many mathematical principles and knowledge involved in KMP algorithm. This paper will only introduce the running process, partial matching table and next array of KMP algorithm. If you understand these three points, you can certainly have a clear understanding of other articles about KMP algorithm.
Please read the following text description with video animation
The video doesn’t show up directly…
Video address: www.bilibili.com/video/av603…
define
The Knuth-Morris-Pratt string search algorithm, abbreviated as KMP algorithm, is often used to find the occurrence position of a pattern string P within a text string S.
This algorithm was jointly published by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, so this algorithm is named after the surnames of these three people.
Does the name Donald Knuth look familiar? Yes, in front of this is probably the best article on Knuth shuffling algorithm also appeared in one of his articles!
The following is the operation flow of KMP algorithm:
- Suppose the text string S now matches the I position and the pattern string P matches the J position
- If j = -1, or if the current character matches successfully (S[I] == P[j]), I ++, j++, continue to match the next character; If j! = -1 and the current character fails to match (that is, S[I]! = P[j]), j = next[j]. This means that the pattern string P moves j-next [j] bits to the right relative to the text string S when mismatching
- In other words, move the index position of the pattern string P corresponding to the value of the next array at the pattern string P mismatch to the mismatch
Can’t see it? Straight to the animation!
Operation process
Take the following text string S and pattern string P as an example:
First, list all substrings of the pattern string P:
a | |||||||
---|---|---|---|---|---|---|---|
a | b | ||||||
a | b | a | |||||
a | b | a | a | ||||
a | b | a | a | b | |||
a | b | a | a | b | c | ||
a | b | a | a | b | c | a | |
a | b | a | a | b | c | a | c |
Then, find all the prefixes and suffixes for each substring.
A prefix refers to all combinations of the headers of a string except the last character. Suffixes refer to all tail combinations of a string except the first character.
Take the fifth column as an example.
The prefix for
a | |||
---|---|---|---|
a | b | ||
a | b | a | |
a | b | a |
The suffix for
b | |||
---|---|---|---|
a | b | ||
a | A | b | |
b | a | a | b |
Therefore, the maximum length of the common element of its prefix suffix is 2.
The maximum length of the common elements of each prefix suffix corresponding to the substring of the original pattern string P is shown in the following figure.
Evaluate the next array based on the maximum length table: The next array is equivalent to moving the maximum length value one bit to the right, and then assigning the initial value to -1.
Now that we have the next array, the operation of the KMP algorithm is clear.
The pattern string P is matched one by one with the letters of the text string S, and when mismatched, the pattern string moves to the right.
How do you move it?
For example, if b of the pattern string is mismatched with C of the text string, find the corresponding value in the next array of the pattern string, which is 0, and move the index 0 to the mismatched position.