There are many kinds of string matching algorithms. Now WE bring BF and KMP to explain. BF is to make a comparison with KMP algorithm, so that this paper has a progressive relationship from shallow to deep.

BF algorithm

The easiest way to find out if one string is in another is to use two loops. This algorithm is called BF, which stands for Brute Force.

This is BF matching

BF algorithm idea

First, let the text string be S and the pattern string be P, assuming that the text string matches the I position and the pattern string matches the J position.

1, if S[I] == P[j], I ++, j++, continue to match the next character;

2, if the mismatch (i.e. S[I]! = P[j]), then let I = I – (j-1), j = 0.

Because the position of I, calculated according to the formula above, goes back, it is called I backtracking.

And the way I = I – (j-1) comes out of this, the way to think about it is, if you take j steps forward, if you want to go back to where you were, you take j steps back, if you want to go further, you take j – 1 steps back.

Of course, you could write I = I – j + 1 to get back to the origin and go one step further, which is easier to understand.

3. After the match is successful, return to i-j, which can be interpreted as going forward so many steps and then returning to the origin.

BF algorithm code implementation

Here I do it in JavaScript:

// For loop
let str = "abdfadfadfgrereabdgfa";
let pattern = "dfadfg";

function BF(S, P) {
  for (let i = 0; i < S.length;) {
    for (let j = 0; j < P.length;) {
      if (S[i] === P[j]) {
        i++;
        j++;
      } else {
        i = i - (j - 1);
        break;
      }
      if (j === P.length) {
        returni - j; }}}return - 1;
}

console.log(BF(str, pattern));
Copy the code
// Two ways to write the while loop
let str = "ababcdaaabbc ababc";
let pattern = "cda";

function BF(S, P) {
  let i = 0; let j = 0;
  while(i < S.length) {
    if (S[i] === P[j]) {
      i++;
      j++;
    } else {
      i = i - (j - 1);
      j = 0;
    }
    if (j === P.length) {
      returni - j; }}return - 1;
}

console.log(BF(str, pattern));

function BF2(S, P) {
  let i = 0; let j = 0;
  while(i < S.length && j < P.length) {
    if (S[i] === P[j]) {
      i++;
      j++;
    } else {
      i = i - (j - 1);
      j = 0; }}if (j === P.length) {
    return i - j;
  }
  return - 1;
}

console.log(BF2(str, pattern));
Copy the code

KMP algorithm

Next, WE will introduce the KMP algorithm, the full name of KMP is Knuth-Morris-Pratt, Chinese name: Knuth Morris Pratt, the algorithm was conceived by D.E.Knuth and V.R.Pratt in 1974, the same year j.H. Morris independently designed the algorithm, and finally published jointly by the three in 1977. It is an important algorithm in string matching algorithm.

Donald Knuth

The difference between BF and KMP

Let’s look at a picture to compare BF and KMP:

You can see that in the mismatch, BF’s I goes back a long way, and j goes back to zero;

However, KMP keeps I unchanged and only needs to set the position of J according to the conditions and continue to match.

The complexity of BF is O(n*m) and that of KMP is O(n+m), so KMP is a relatively efficient string matching algorithm.

From this comparison, we can see that KMP is efficient because it only needs to move the pattern string.

So how do you find a way to do that?

PMT array

To do this, KMP proposes an array called PMT, which stands for Partial-Match-Table.

This table is generated from the pattern string, which records where the pattern string needs to move when it is mismatched.

Let’s take a look at what the PMT looks like:

You can see that each character in the pattern string corresponds to a number. This number is the PMT value of the character. As mentioned earlier, this PMT value represents the position that the pattern string needs to move when it is mismatched.

How to calculate PMT

So how do you figure out PMT? The answer is obtained by calculating the prefix and suffix of the string:

As a first step, we need to find all of the substrings of the pattern string, and we need to write all of the substrings from front to back (think of it as a set, and therefore include itself).

The second step, find each substring before the suffix combination, prefix is does not contain the last set of character substring, suffix is does not contain the set of the first character substring, then find before the suffix set intersection, and finally find the intersection string length of the longest, the length of the string, is the PMT value we need.

As can be seen from the figure above, in fact, the PMT value corresponding to each character in the pattern string is the maximum string length value in the common set of prefixes and suffixes of this substring.

A little exercise in the maximum prefix and suffix

Here are some exercises to find the largest common prefix and suffix:

What are the patterns from these exercises? Yes, the maximum prefix and suffix of a pattern string are only at the beginning and end, respectively, and they are all equal.

So, what does that mean?

At this point, you can see what PMT really does.

The matching process of KMP

Let’s take a look at the matching process of KMP:

Explain the three points marked in the figure below:

1. In the figure, V represents PMT.

2. When mismatch occurs, find the PMT value of j-1 position, which is 2 in the figure, that is, PMT[J-1] = 2. The characters marked in red are the prefixes and suffixes in the matched substring DFADF.

3. Next, keep I unchanged and set the value of j to PMT[J-1], i.e. 2. This step is to align the prefix of pattern string with the suffix of matched string, which is to make full use of known information to improve the matching efficiency.

PMT array utility summary

Has the largest former suffix, respectively, at the beginning and the end of the original string, we can use this property to record known information, namely have matching characters, if there is greater than 0 PMT value, suggests a need to match the place has been matched to avoid missing, otherwise is 0, j would simply go back to zero, This is where the PMT array comes in.

Generate PMT ideas

Now that we know what PMT is for, how do we use a program to generate PMT?

Ideas:

1.

2,

3,

4,

5. The backtracking of I from match to mismatch is a little more complicated:

6, because I = 0, we can make a comparison, if I! Lambda is equal to 0, and then we’re going to do 5 and 6 again.

This is the general process, and you can do a few exercises to reinforce it:

KMP algorithm code implementation

The following is according to the above ideas with the program to achieve the algorithm:

let str = "abdfadfadfgrereabdgfa";
let pattern = "dfadfg";

/ / for PMT
function partialMatchTable(P) {
  // Initial conditions
  let PMT = [], i = 0, j = 1;
  PMT[0] = 0;
  while(j < P.length) {
    if (P[i] === P[j]) {
      i++;
      PMT[j] = i;
      j++;
    } else {
      if(i ! = =0) {
        i = PMT[i - 1];
      } else {
        PMT[j] = 0; j++; }}}return PMT;
}

/ / KMP subject
function KMP(S, P) {
  let i = 0, j = 0, TMP = partialMatchTable(P);
  while (i < S.length && j < P.length) {
    if (S[i] === P[j]) {
      i++;
      j++;
    } else {
      if(j ! = =0) {
        j = TMP[j - 1];
      } else{ i++; }}}if (j === P.length) {
    return i - j;
  }
  return - 1;
}
console.log(KMP(str, pattern));
Copy the code

I have referred to many excellent articles and videos in the writing process, and I would like to thank them for their help.

If you find something you don’t understand or incorrect while reading this article, please let me know and I will continue to revise it.

The resources

  1. www.bilibili.com/video/av324…
  2. www.cnblogs.com/rubylouvre/…
  3. www.zhihu.com/question/21…
  4. Blog.csdn.net/v_july_v/ar…
  5. Github.com/mission-pea…
  6. www.ruanyifeng.com/blog/2013/0…