As mentioned in the previous article, KMP determines the position of I +1 and the pattern string to be matched next by analyzing the pattern string and the current failed string to determine the longest prefix and suffix.

So the next step is to figure out how to figure out where I +1 and pattern string are going to match after a failed match?

Finite deterministic state machines

In the theory of computation, deterministic finite state automaton or DETERMINistic finite Automaton (DFA) is an automaton that can realize state transition. For a given state belonging to the automata and a character belonging to the alphabet sum of the automata, it can be transferred to the next state (this state can be the previous state) according to the pre-given transfer function.

Baidu Baike explanation.

This algorithm is also used in the determination of the backoff position of KMP.

Next, the backoff determination process of KMP algorithm is analyzed.

The status is classified into two types: 1. Success State In success state, the status is moved forward by one digit. The failure state is more complicated. What you need to consider is whether to fall back to the beginning or to the next digit of the longest prefix. Here's a picture to illustrate.Copy the code

For example, the pattern string to match is ABABCB. Here is the state machine operation for this string

1. State machine operation under the correct state in each step

Enter A for position 1 and 0, and the matching state points to 1. Enter B in position 2 and 1, and the matching state points to 2. Enter A for position 3 and 2, and the matching state points to 3. Enter B for position 4 and 3, and the matching state points to 4. Enter C for position 5 and 4, and the matching state points to 5. Enter B at position 6 and 5. If the matching status continues +1 is greater than the length, the matching status ends.Copy the code

2. Rollback in the failed state

1, 0 C does not match the location of the input, the state to 0. 2, 1 C does not match the location of the input, state to 0. 3, 2 B does not match the location of the input, state to 0. 4, 3, A mismatch, the location of the input state to 1. 5, 4, A mismatch, the location of the input state to 1. 6, 5 A mismatch, the location of the input state to point 1.Copy the code

The next question is: why do some of the backdrops point to 0 and some point to 1?

The pattern string is ABABCB and we said that each position in the pattern string can match A different character, so the important thing to understand here is to match to bit 3 (starting from 0) and enter A and then the pattern string is ABAB. The longest common prefix we found at this point is A. The running status is shown in Figure 1-1 below. If the match fails and we know that it's A, we know that the first string of the pattern is also A and we know that this bit in the pattern string can omit the match and go straight to the first bit of the pattern (starting from 0).Copy the code

With this concept in mind, it was time to consider how to construct such a state machine.

Each position has the potential to match a different character. The fallback of each position is determined by the existing character and the character being matched.

Here is the code implementation:

/** * generates dFA **@param {String} patternString* /
function dfa(patternString) {
    let M = 256; // Represents 255 characters
    let N = patternString.length;
    let dfa = [];
    // Initialize an MxN array
    for(let i=0; i<M; i++){ dfa.push([]);for(let j=0; j<N; j++){ dfa[i].push(0)}}// Initialize a two-dimensional array
    dfa[patternString.charCodeAt(0] [0] =1; // represents the matching state of the first bit plus one
    //j is always forward X represents where j needs to be copied
    for(let j=1,X=0; j<N; j++){for(let C=0; C<M; C++){// Copy status
            dfa[C][j]=dfa[C][X];
        }
        // This is the key step, setting the correct next position of the string currently matched by j
        dfa[patternString.charCodeAt(j)][j]=j+1;
        // check whether the dFA is the same as the previous one, if not, j+1 x is still 0
        // This ensures that the j and x positions are always the same before the next digit can be compared
        CharCodeAt (j)][X] records the position that each character should reach once the match is successful. For example, the longest prefix will continue down, and the rest will not
        X=dfa[patternString.charCodeAt(j)][X];

    }
    return dfa
}
Copy the code