Dear friends, long time no see, today we will open the column “Data Structure and Algorithm”, we will sort out the important knowledge points of the core course together, so let’s start

The body of the

“String matching” is one of the basic tasks of computers. For example, I have a string “aaaaaaca”. I want to know whether it contains another string “aaaac”.

This is where “string pattern matching algorithms” are used, the most common being the traditional “Brute-Force” and “KMP” algorithms respectively.

BF algorithm design idea

1. Primary and pattern strings are compared character by character


2. When “different strings” (i.e., “mismatch”) occur, the comparison position of the main string is reset to the next character from the start position, and the comparison position of the pattern string is reset to the start position


3. If the match is successful, return the starting position of the matching string in the main string, otherwise return an error code

BF algorithm design defects and solutions

In the BF algorithm, each mismatch needs to be backtracked to the next character from the start character of the last comparison. Through observation, it is found that “some part” of matching is not necessary to continue the comparison when backtracking, which can reduce the “time complexity” of the algorithm.


The appearance of “KMP” algorithm effectively solves the defect of BF algorithm. KMP algorithm is jointly proposed by D.E.Knuth, J,H,Morris and V.R.Pratt.

However, compared with BF algorithm, this algorithm is not easy to understand. There are many explanations on the Internet, but there are few pictures, which always feel almost meaning. Below, I introduce the design idea and working principle of KMP algorithm in detail by drawing

Design idea of KMP algorithm

When character comparison is not equal in the matching process, the comparison position of “main string S” is not backtracked, and the comparison position of “mode string T” is moved


In the matching process, there is a difficult problem to solve: how to calculate the number of moving digits when the “pattern string T” is mismatched? The “partial matching function” was designed through the research of three outstanding people.

Partial matching function

Partial matching function is the most difficult part of KMP algorithm. First, you need to understand the concepts of “prefix”, “suffix”, and “maximum common length”.

· Prefix: refers to all header combinations of a string except the last character

· Suffix: refers to all tail combinations of a string except the first character


· Maximum common length (partial matching value) : refers to the maximum common element in the prefix and suffix, 0 if no. For example, the prefix of abab is “A”, “ab”, “ABA”, the suffix is “B”, “ab”, “bab”, and the maximum common element is “ab”. Therefore, the maximum common element is 2

Review the matching process of KMP algorithm:


The part marked by the red line is exactly the part that has been matched during the mismatch. The maximum common element of “AAAA” is “AAA”, and these characters are directly skipped without repeated comparison


During the code implementation, the position of j after moving = the subscript of the starting position of pattern string T + partial matching value. Generally, the starting subscript is 0, so the position after j moves = partial matching value, that is, j = next[j], next[j] is the “partial matching function”, and j is the position when the mismatch occurs

So what follows is the implementation of the partial matching function. Enumerates the maximum common length of all substrings starting with the first character of “aaaAC” to form a “partial match table”, which describes the relationship between the subscript j and the partial match value in case of mismatching


Partial matching table is realized through the self-matching of pattern string T:


Sample code (C) :

/*KMP matching algorithm */

int KMPCompare(HString parent, HString child, intpos) {

 int next[255];

 Get_Next(child, next);

 int i = pos - 1;

int j = 0; //j is used for the starting position in the child string

 while (i < parent.length && j < child.length) {

  if (j == 0 || parent.ch[i] == child.ch[j]) {

   ++i;

   ++j;

  }

  else {

j = next[j]; // I does not change, j receded

  }

 }

 if (j == child.length) {

  return (i + 1) - j;

 }

 return 0;

}



/* Partial matching function implementation */

void Get_Next(HString child, int * next) {

 int i = 0;

 int j = -1;

next[0] = -1; // Do not use

 while (i < child.length) {

  if (j == -1 || child.ch[i] == child.ch[j]) {

   ++i;

   ++j;

   next[i] = j;

  }

  else {

   j = next[j];

  }

 }

}



void main() {

/* Use KMP algorithm to match string */

 HString parent, child;

 StrAssign_HeapString(&parent, "BBC ABCDAB ABCDABCDABDE");

 StrAssign_HeapString(&child, "ABCDABD");

 printf("Index = %d\n", KMPCompare(parent, child, 1));

}

Copy the code

Attention can improve learning efficiency! I’m Aime bacteria. See you next time! Peace~

Progress a little bit every day, slow down to go faster