KMP

Violent algorithm is in the case of mismatch, backtrace the substring pointer, match from scratch. KMP mismatches try to move forward more aggressively, skimming over impossible matches. The length of the forward move is determined by a “partial match table”.

(CLRS Introduction to Algorithms, 3rd Ed 1)

Partial match in front of the table is really a mismatch that position (matching) on the inside of the longest suffix length before public, move is to prefix directly against the suffix that position, just skip the middle can’t match (already swept, behind do not match, repeat and still does not match) of that paragraph, began to try a possible match directly.

Partial match table

Refer to Wikipedia’s algorithm 2 (see English page, some paragraphs missing).

By hand:

func GetNext(substr: string) []int {  / / substr: [0... M - 1)
    M := len(substr)
    next := make([]int, M)

    next[0] = - 1
    for j := 1; j <= M; j++ {
        next[j] = len(longest common suffix (substr[0:j])  // before the j character)}return next
}
Copy the code

The longest common prefix and suffix is for example:

(Picture from Ruan Yifeng “STRING Matching KMP Algorithm” 3)

Programming quick calculation:

func GetNext(substr string) []int {
	next := make([]int.len(substr))
	next[0] = - 1

	i, j := 0.- 1
	for i < len(substr)- 1 {
		if j == - 1 || substr[i] == substr[j] {
			i++
			j++
			next[i] = j
		} else {
			j = next[j]
		}
	}
	return next
}
Copy the code

matching

KMP:

func KMP(s string, substr string) int {
	next := GetNext(substr)

	i, j := 0.0
	for i < len(s) && j < len(substr) {
		if j == - 1 || s[i] == substr[j] {
			i++
			j++
		} else { // miss matching
			j = next[j]
		}
	}

	if j >= len(substr) { // matched
		return i - len(substr)
	}
	return - 1 // not matched
}
Copy the code

This program actually works:

package main

// ...

func main(a) {
	s := "BBC ABCDAB ABCDABCDABDE"
	sub := "ABCDABD"

	res := KMP(s, sub)
	println(res)
}
Copy the code

Next array

Add a point: partial matching table, there are several different versions, such as “Introduction to algorithms” 1, and for example, our postgraduate examination book (Tianqin Wang Dao) 4,^, 5.

(The partial matching table is affectionately called “the next array”…)

Here’s an example of the difference:

index	0	1	2	3	4	5	6
char	a	b	a	b	a	c	a
CLRS	0	0	1	2	3	0	1
Wikipedia	– 1	0	0	1	2	3	0	= CLRS moves one to the right and complements the left`- 1`
Day frequently king	0	1	1	2	3	4	1	= Wikipedia + 1

If mismatch at original (large) index I, schema (substring) index j:

The CLRS moves toT[j-1](Special cases:j=0 则 i+=1)
Wikipedia moved toT[j](Special cases:j=-1 则 j+=1).

Better understood code

To be honest, I can’t read the code above. Mainly is the period of the if (j = = 1 | | s [I] = = substr [j]) {… } WTF????

Studied for a long time to understand clearly is two different cases, just happened to be the same processing methods merged together, put here to write assembly. Throw up.

It is easier to understand if the two cases are written separately by referring to answer 6 on Zhihu:

func GetNext(substr string) []int {
	next := make([]int.len(substr))
	next[0] = - 1

	i, j := 0.- 1
	for i < len(substr)- 1 {
        if j == - 1 { // special: end of the line: the smallest, can not be reduced
			i++
			j = 0
			next[i] = j
		} else if substr[i] == substr[j] { // Add one more bit
			i++
			j++
			next[i] = j
		} else { // Mismatch: shrink, re-match
			j = next[j]
		}
	}
	return next
}

func KMP(s string, substr string) int {
	next := GetNext(substr)

	i, j := 0.0 // I of s and j of substr
	for i < len(s) && j < len(substr) {
        if j == - 1 { // special: go back to the head: I move one, j compare from the head
			i++
			j = 0
		} else if s[i] == substr[j] { // add: next character
			i++
			j++
		} else { // mismatch: j moves to the next possible match
			j = next[j]
		}
	}

	if j >= len(substr) { // matched
		return i - len(substr)
	}
	return - 1 // not matched
}
Copy the code

Improve the KMP

~~Improve 🔨, to sleep.~~

Nextval nextVal nextVal nextVal nextVal

Nextval construction rules are as follows:

nextval[0] = - 1;
for (int j = 1; j < len(substr); j++) { nextval[j] = (substr[j] ! = substr[next[j]] ? next[j] : nextval[next[j]] ); }Copy the code

If the character of next[j] is the same as that of the current character, it is not necessary to go back step by step (which has already been done before), but directly borrow the previous result and quickly go back to the end.

e.g.

substr	A	B	A	B	A	A	B
j	0	1	2	3	4	5	6
next	– 1	0	0	1	2	3	1
nextval	– 1	0	– 1	0	– 1	3	0

(Remember that the test book starts at 1 and increments next and Nextval.)

reference

Cormen T H, Leiserson C E, Rivest R L, et al. Introduction to Algorithms, 3rd Ed.↩
Wikipedia. Knuth, Morris, Pratt algorithm. En.wikipedia.org/wiki/Knuth -… ↩
Nguyen. String matching KMP algorithm. www.ruanyifeng.com/blog/2013/0… ↩
High Score Notes on Data Structure (10th edition of Tianqin in 2022). China Machine Press, 2020.↩
Wang Dao Forum. 2022 Data structure Test review guide. Publishing House of Electronics Industry, 2021↩
Ruan Xingzhi. How to better understand and master KMP algorithm? . www.zhihu.com/question/21… ↩

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Notes on KMP algorithm

KMP

Partial match table

matching

Next array

Better understood code

Improve the KMP

Recommended reading

reference

Notes on KMP algorithm

KMP

Partial match table

matching

Next array

Better understood code

Improve the KMP

Recommended reading

reference

Related Posts

SpringBoot uses PageHelper for data paging

Application layer with microservices and service discovery

Initial experience of Ali RPC framework DUBBO