I read a lot of articles, once it comes to k=next[k] in getNext, many articles are just a brush brush, or many articles will say that finally understand KMP algorithm, but here is still a brush brush, more and more confused. This note focuses on the logic k = next[k], and assumes you already understand the rest.

I finished writing my notes, but I found that only I seemed to understand them, which was very embarrassing. If you still feel vague at the end, you might as well draw a picture by yourself. If you still don’t understand K = next[K], you can discuss it privately.

public static int KMP(String ts, String ps) {
    char[] t = ts.toCharArray();
    char[] p = ps.toCharArray();
    int i = 0, j = 0;
    int[] next = getNext(ps);
    while (i < t.length && j < p.length) {
        if (j == -1 || t[i] == p[j]) {
            i++;
            j++;
        } else{ j = next[j]; }}if (j == p.length) {
        return i - j;
    }
    return -1;
}

/** * ABCDEEEEEABCEABCDEEEEEABCF * ABCDEEEEEABCD * A B C D E E E E E A B C D * [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3] * /
public static int[] getNext(String ps) {
    char[] p = ps.toCharArray();
    int[] next = new int[p.length];
    int j = 0, k = -1;
    next[0] = -1;
    while (j < p.length - 1) {
        if (k == -1 || p[j] == p[k]) {
            // There is an optimization point here, but this is only the most primitive code, how to optimize can refer to other articles
            next[++j] = ++k;
        } else {
            k = next[k];
        }
    }
    System.out.println("next = " + Arrays.toString(next));
    return next;
}
Copy the code

A few conclusions:

A few conclusions. KMP is designed with the exception of getNext, assuming you already understand. For primary char[] t, subchar [] p

  • 1, next[j]: when p[j]! = t[j], the maximum length of p string with the same prefix and suffix.
  • 2, next[j] = -1: indicates that the primary and substrings need to be moved right

Combine these two conclusions to analyze the getNext method

GetNext array solution procedure

public static int[] getNext(String ps) {
    char[] p = ps.toCharArray();
    int[] next = new int[p.length];
    int j = 0, k = -1;
    next[0] = -1;
    while (j < p.length - 1) {
        if (k == -1 || p[j] == p[k]) {
            // There is an optimization point here, but this is only the most primitive code, how to optimize can refer to other articles
            next[++j] = ++k;
        } else {
            k = next[k];
        }
    }
    System.out.println("next = " + Arrays.toString(next));
    return next;
}
Copy the code

I read a lot of articles, all of them said that the core is K = next[k], all of them said that this place is the core, but they all chose to skip over, and wasted a lot of time on useless articles when learning KMP.

1. Analyze the process

  • 1, p[j] == p[k]
  • 2, k = next[k];
  • 3, k == -1;

2, p[j] == p[k]

When p [J0]! = t[I] &&p [J0] = p[K0], next[J0] = K0; Next [J1] = K1?

Derivation formula is: [J0] = P = P (K0), next [J0] = = = = = K0 is derived in this paper > next [J1] = = K0 +1 == K1;
Copy the code

3, K = next[K]

Next [K] = next[K] = next[K] = next[K]

So it’s hard to make sense of this whole thing without talking about complete induction

Combined with the flow of Formula 2: (1) suppose next [J0] = K0 && P [J0] = = P (K0), so there are next [J1] = = K0 +1 == K1
(2If P[J1] == P[K1], next[J2] = K1 +1= K2; As shown in the figure below, the position of K2 can be directly deduced in time and the value of next[J1...JN] can be obtained successivelyCopy the code

If P (J1)! = P[K1]

So if P[J1]! = P[K1], next[J2] = P[K1] At this point, K1 needs to be moved to the left until K3 is found to ensure that P[J1] == P[K3], then next[J2] == K3 begins to deduce K = next[K] formula:1K1 shifts left to get the longest common prefix and suffix K3 node of J2, where next[J2] = K3, P[0 ~ K3-1] == P[ J1-K3+1 ~ J1]
(2P[J1] == P[K3 -1], P[J0] == P[K3 - 2]...
(3Remember P[J0] == P[K0]? (4) according to (3P[K0] == P[K3 -1]
(5) step (2) and steps (4(Is it the same? Combined with K3 < K1 and P[K0] == P[K3 -1], note that this is for any node K, not specifically where it is in the array. Therefore, it can be concluded that K3 is the node with the longest common prefix of K1. Since K3 is the node with the longest common prefix of K1, does it satisfy K == next[K]? (6) According to the steps (5), since the node K3 found is the node with the longest common prefix and suffix of K1, the value of K3 can be obtained directly by K = next[K].Copy the code

In fact, the above six steps are completely grasping the cause of the fruit. According to the conclusion, the origin of K = next[K] is backward deduced by complete induction.

4, K = = 1

When P (J1)! = P[K1], K needs to be moved to the left. When K moves to the initial position -1, the longest common prefix and suffix node is still not found, which indicates that J2 has no common prefix and suffix at this time, next[J2] == 0, that is ++K (because at this time K == -1).