demo#1. BF algorithm – Storm matching algorithmIdeas:
-
The count Pointers I and J are used to indicate the position of the character to be compared in the main string S and mode T respectively. The initial value of I is pos, and the initial value of j is 1.
-
If both strings are compared to the end of the string, that is, I and j are less than or equal to the length of S and T, then the following operation is performed: S[I] and T[j] compare, if equal, then I and j indicate the next position in the string respectively, continue to compare the subsequent characters; If not, the pointer falls back and starts matching again. Starting from the next main string (I = i-j + 2) and recomparing with the first pattern character (j = 1);
-
If j > t. length, it means that each string in mode T is equal to a sequence of consecutive characters in the main string S, then the match is successful, and the serial number of the first character in mode T in the main string S is returned (i-T.length); Otherwise, the match fails and 0 is returned;
typedef char String[MAXSIZE+1]; / / String StrAssign(String T,char *chars) {int I; if(strlen(chars)>MAXSIZE) return ERROR; else { T[0]=strlen(chars); for(i=1; i<=T[0]; i++) // T[i]=*(chars+i-1); T[i] = chars[i-1]; //*(arr+ I) //*(arr+ I) = arr[I] return OK; }} int Index_BF(String S, String T,int pos){// int I = pos; Int j = 1; // int j = 1; / / if I less than the length of S and j is less than the length of the T loop continues while (I < = S [0] && j T [0]) < = {/ * comparison of two letters are equal, continued to compare * / if (S = = [I] T [j]) {i++; j++; }else{// if the pointer is not equal, then the pointer is rematched. // If the pointer is rematched, then the pointer is rematched. // add 1, because the substring starts with 1; // Add one more element from the first bit of the last match; i = i-j+2; // return j to the first place of the substring T j = 1; If (j >T[0]) {if (j >T[0]) {return i-t [0]; }else{ return -1; }}Copy the code
If the parent string is 50 zeros and one one, and the substring is 10 zeros and one one then BF is a waste of time, so RK is introduced.
#2. RK algorithm #####Hash (Hash). You can also directly transliterate “hash”; Hashing is a common tool in development! For example, the commonly used MD5 algorithm is the hash algorithm; Hash algorithm is widely used in security, which is generally reflected in the following aspects:
- File check
- A digital signature
- Authentication protocol ##### Basic idea of RK algorithm HASH! If two strings have different hash values, they must be different; If they have the same hash value, they are not necessarily the same. Fairly RK algorithm basic idea is: the pattern string of P hash value with each of the main string S a substring of length is | | P hash value comparison. If they are different, they are definitely not equal; If they are the same, I will compare them with you. Advantages are:
- Divide the parent string by the length of the pattern string and compare the hash values of the substrings
- You hash all the substrings and you compare them, you don’t hash all the substrings and then you compare them
##### hash values allow different character combinations to be mapped to different numbers by some calculation of a formula! For example, compare “ABC” and “CDE”; Compare 123 and 456; Is it the same? 657 = 6 *10 *10 + 5 *10 + 7 *1 657 = 6 *10^ 2 + 5 *10^1 + 7 *10^0 so the letter converts to the hash value “CBA” = ‘c’ * 2626 + ‘b’ * 26 + ‘a’ * 1 = 2 * 26 * 26 + 126 + 0 * 1 = 1378 CBA = C * 262 + B * 261 + a * 260 = 2 * 262 + 1 * 261 + 0 * 260 = 1352 + 26 + 0 = 1378 ###### Substring hash solution rule: two adjacent substrings S [I] and S [I +1] (I represents the starting position of the substring from the main string, and the length of the substring is m). The corresponding hash formula has intersection. That is, we can use s[I -1] to compute the hash of s[I]; ✖ 102 s [I] = 1 + 2 ✖ 101 + 7 ✖ 100 s [I + 1) = 2 ✖ 102 + 7 ✖ 101 + 4 ✖ 100 s [I + 1) = 10 ✖ ✖ 102) (127-1 + 4 s [I + 1) = 10 ✖ (s – [I] 1 * 102) + 4 s[I +1] implementation shows the last s[I] stripped of the highest bit data and the remaining M-1 is a character multiplied by d-base. Plus the last character to get;
#define d 26 //4. When the HashValue of the schema string and substring are found to be the same. Int isMatch(char *S, int I, char *P, int m) {int is, IP; for (is = i, ip = 0; is ! = m && ip ! = m; is++,ip++) if (S[is] ! = P[ip]) { return 0; } return 1; } // 1;} // 1; int getMaxValue(int m){ int h = 1; for(int i = 0; i < m - 1; i++){ h = (h*d); } return h; } /* * RK algorithm for string matching * Author: Int RK(char *S, char *P) {//1. N: string length, m: string length int m = (int) strlen(P); int n = (int) strlen(S); Printf (" main string length :%d, substring length :%d\n",n,m); //A. The hash value of the pattern string; St. Hash value of main string decomposition substring; unsigned int A = 0; unsigned int St = 0; // Loop [0,m] to get the HashValue of mode A and the first HashValue of main string [0,m] // Then main string :" abcaAdddabceEFFccdd" It [0, 2) is ab / / the pattern string: "cc" / / cc = 26 26 ^ ^ 1 + 2 * 2 * 0 = 52 + 2 = 54; / / ab 26 ^ 1 + 1 = 0 * * 26 ^ 0 = 0 + 1 = 1; the for (int I = 0; I! = M; i++) {/ / for the first time A = 0 * 26 + 2; / / the second A = 26 + 2 * 2; A = (A + d * (P [I] - 'A')); / / the first st = 0 * 26 + 0 / / second st = 0 * 26 + 1 st = (d * st + (S [I] - 'a'));} / / 3. The values for d ^ m - 1 (because often want to use d ^ m - 1) hexadecimal values int hValue = getMaxValue (m); / / 4. Traversal (0, n - m], // If the HashValue is not the same as the HashValue of the other substrings, then the next HashValue will be calculated. // If (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) (st[0] = st[0]) For (int I = 0; I <= n-m; for(int I = 0; I <= n-m; for(int I = 0; I <= n-m); for(int I = 0; I <= n-m); for(int I = 0; I <= n-m); for(int I = 0; I <= n-m); I++) {if (A = = St) if (isMatch (S, I, P, m)) / / 1 reason, count from 1 to start the return I + 1; St = ((St - hValue * [I] - 'A') (S) * d + (S + m] [I - 'A'));} return -1; }Copy the code
- If hash conflict is not done, the number of second check comparison is N-m +1; So time order n.
- But it is possible to resolve the conflict. You need to add a double check! So you need m comparisons; So the time complexity is O(n*m);