This is the fourth day of my participation in the August More text Challenge. For details, see:August is more challenging
Pattern matching algorithm of data structure series
String pattern matching
Pattern Matching: Substring positioning (lndex (S; T) function)
- Algorithm purpose
Determines the first occurrence of the substring T contained in the main string S
- The initial conditions
The strings S and T exist, and T is a non-empty string
- Operating results
If S has a substring with the same value as T, return its first occurrence in the main string, otherwise return 0.
1. Classical pattern matching algorithm (BF algorithm)
BF algorithm design idea
- 1. Compare the ith (initial I =l) character of main string S with the first character of mode T. If they are equal, continue to compare subsequent characters one by one. If not, start from the next character (i++) of the main string S, and re-compare it with the first character of T.
- 2. A sequence of consecutive substring characters up to the main string S is equal to the pattern T. The return value is the ordinal number of the first character in the subsequence of S that matches T, that is, a successful match. Otherwise, the match fails and the value 0 is returned.
#include<iostream>
#include<stdlib.h>
using namespace std;
//1. Variable length storage of string
typedef struct HString{
char *ch;
int length;
}HString;
/ / 2. BF algorithm
int BruteForce(HString str,HString substr){
int i=1,j=1,k=i;
while(i<=str.length&&j<=substr.length){
if(str.ch[i] == substr.ch[j]){
i++;
j++;
}
else{
j = 1; i = ++k; }}if(j>substr.length)
return k;
else
return 0;
}
int main(a){
return 0;
}
Copy the code
The classical pattern matching algorithm (BF algorithm) sets the length of the main string m and the length of the substring n, then the time complexity of the worst case is O((n-L) * (m-n))=O(n*m). Disadvantages: Every time when there is a mismatch, the pointer of the main string I and the pointer of the substring J have to go back, making the algorithm inefficient.
2. The KMP algorithm
- Feature: When the selection does not match, the main string pointer I does not fall back. Only the substring j pointer is rolled back to the specified position, which is not necessarily the first position of the substring.
- The core of the KMP algorithm is to calculate the position to fall back to when there is a mismatch at the J pointer
- Use the next[] array to store the required location information, where next[k] stores the position that J should fall back to if a mismatch occurs at j=k.
- Solving for the next[] array is string dependent, not main string dependent
How to solve the next[] array?
Manual solution idea:
- Next [l]=0 The first character in the table fails to match, indicating that the main string I is shifted back
- Next [2]=1 The second character in the table fails to match. In this case, the substring j can only go back to the beginning of the string
- The value of next[j] is the string length of FL or FR +1
- FL or FR should be as large as possible
- Solving next[] is actually looking for the maximum common element length for prefixes and suffixes
Algorithm idea:
- Problem description:
How to find next[k +1]?
- If T = T [j], [k] the next [k + l] = next [k] + l = j + l, then k++ j++.
- If T T [j], [k] indicates the fallback j to the next [j], if j = O, is to determine T T [j], [k] and if j = 0, the next [k + l] = 1;
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MaxSize 100
/* Construct the match array */
void BuildMatch(char *pattern,int *match)
{
int n = strlen(pattern);
int dis = 1,s = 0;
while (s+dis < n)
{
if(pattern[s] == pattern[s+dis])
{
match[s+dis] = match[s+dis- 1] +1;
s++;
}
else
{s = 0;dis++;}
}
}
/*KMP*/
int KMP(char *str,char *pattern)
{
int n = strlen(str);
int m = strlen(pattern);
int s = 0,p = 0;
int *match = (int*)malloc(sizeof(int)*m);
memset(match,- 1.sizeof(int)*n);
BuildMatch(pattern,match);
while (s < n && p < m)
{
if(str[s] == pattern[p]){s++; p++; }else if(p > 0)p = match[p- 1] +1;
else s++;
}
return p == m ? s-m+1: - 1;
}
int main(int argc, char const *argv[]) {
char str[MaxSize],pattern[MaxSize];
printf("Please enter a string :\n");
scanf("%s",str);
printf("Please enter a substring :\n");
scanf("%s",pattern);
int result = KMP(str,pattern);
if(result == - 1)printf("Search failed \n");
else printf('%d\n',result);
return 0;
}
Copy the code