Manacher algorithm

1. The introduction of

Manacher algorithm and KMP algorithm are common algorithm prototypes for solving string-related problems, but the problems they solve are different.

The Manacher algorithm was originally designed specifically to solve the “longest palindrome substring problem in a string”. In fact, there is a very important piece of information in the Manacher algorithm that can be used to solve many other problems. This information is the longest palindrome radius of each position.

2. The palindrome

Palindrome colloquially means that the content is the same when viewed forward and backward.

Palindromes are defined as having an axis of symmetry where the left part is the reverse of the right part (the right part is the reverse of the left part).

For example, the strings “abcba” and “abba” are palindromes.

3. The longest subroutine string

Finding the longest substring is essentially finding which substring in a string is palindrome and is the longest.

Each character in a substring is required to be contiguous (or subsequence if the character is not contiguous).

For example, in the string “abc1232de1”, the longest substring is “232”.

4. Classical solution

We can imagine a way of doing this ourselves, such as treating each character in the string as an axis of symmetry, and then matching both sides at the same time. An obvious problem with this method is that if the length of the palindrome string is even, it cannot be detected, because the symmetry axis of the palindrome string with even length is actually “virtual” and cannot be located.

Suppose the current string is “122131221”.

If we had used the method we originally imagined, the “1221” substring would not have been detected.

We improved on the original imagined method by processing the original string before matching. This is done by adding a “#” to the left and right of the string and a “#” between every two characters. #1#2#2#1#3#1#2#2#1#” Then treat every character in the processed string as an axis of symmetry, and start matching to the left and right, recording the length of the substring. Finally, divide the length of the substring calculated at each position by 2 (rounded down), and the corresponding length of the substring corresponding to the original string is the length of the substring of symmetry at this position.

Using the improved method, both the odd number and the even number of subroutines can be detected.

We want to ask a question: should we add auxiliary characters to the original string that are not present in the original string?

Whatever the auxiliary character is, it doesn’t affect the final answer. Because no matter which character of the processed string is matched from left to right, it is always auxiliary character to auxiliary character ratio, real character to real character ratio.

Code implementation:

public static int plalindrome(String str) {
    if (str == null || str.length() == 0) {
        return -1;
    }

    char[] c = str.toCharArray();

    char[] newC = new char[c.length * 2 + 1];
    newC[0] = The '#';

    // a pointer to c
    int i = 0;
    // a pointer to newC
    int j = 1;

    // Use the auxiliary character # to process the original string
    while (i < c.length && j < newC.length - 1) {
        newC[j ++] = c[i ++];
        newC[j ++] = The '#';
    }

    return process(newC);
}

public static int process(char[] str) {
    // Palindrome array
    int[] next = new int[str.length];

    // Build the palindrome area array
    for (int i = 0; i < str.length; i ++) {
        int j = i - 1;
        int k = i + 1;
        // The palindrome field is at least 1, for itself
        int count = 1;
        // Match each character on both sides
        while (j >= 0&& k ! = str.length) {if(str[j] ! = str[k]) {break;
            }
            j --;
            k ++;
            count = count + 2;
        }
        next[i] = count;
    }

    // Sort the palindrome array in ascending order
    Arrays.sort(next);

    // Divide the largest element in the palindrome array by 2 to get the maximum length of the substring
    return next[next.length - 1] / 2;
}
Copy the code

Let’s estimate the time complexity using this method:

For a worst-case example, the original string is: “1111111”. #1#1#1#1#1#1#1

If the character is on the left side of the symmetry axis, it will be matched to the left edge. If the character currently used as the axis of symmetry is in the right half, it must match the right-most boundary.

Therefore, assuming that the original string length is N, the time complexity is O(N^2). We needed to optimize this classical method to create the Manacher algorithm.

Palindrome radius and palindrome diameter

Before we get to Manacher’s algorithm, we need to understand the concepts of palindrome radius and palindrome diameter.

Palindrome diameter: The total number of characters counted from the symmetry axis to the edge of the palindrome area.

Palindrome radius: The total number of characters counted after the boundary of the palindrome area is derived left or right from the symmetry axis.

6. Manacher algorithm

In fact, the processing flow of Manacher and the classical solution is the same. Manacher mainly designs accelerated operation like KMP, and Manacher can optimize the time complexity to O(N).

There are three important points in Manacher algorithm design:

  • A palindrome radius array needs to be constructed by calculating the palindrome radius for each character in the string being processed by the auxiliary character.
  • Set a variable R to record the rightmost index of the palindrome boundary (initial value -1) of all characters that previously matched the palindrome region.
  • Set a variable C, used with R, to record the subscript of the center of the palindrome from which the right-most subscript is currently obtained (initial value -1, if the right-most subscript coincides, use the original center subscript).

The second point is a little harder to understand, as shown below:

R is always incremented, and the subscript is updated to R whenever the character palindrome is further to the right.

The third point is a little harder to understand, as shown below:

C is always increasing, R updates C must update, R does not update C must not update.

7. The process

First, we need to build the palindrome radius array. When we build the palindrome radius array, we encounter two big cases:

First big case: the position of the currently matched character is not in the right-most boundary of the palindrome region of the previously matched character. In this case, there is no optimization, only from the center point to both sides of the violent expansion matching, and calculate the palindrome radius of the character.

Such as:

Second big case: the position of the currently matched character is in the right-most boundary of the palindrome region of the previously matched character (as in the case above when R=2).

When the second big case occurs, there must be a general topology as shown below:

I is the position of the current matched character, and I ‘is the symmetry point of I made by symmetry axis C. C, L, and R must all exist.

It is impossible for I and C to overlap, because C represents the center of the longest loop substring constructed of characters before I. By the time you get to position I, C must have already traversed.

The second large case can be divided into three specific cases according to the condition of the palindrome region of I ‘, each of which has a separate optimization.

(1) The first case: the palindrome region of I ‘is completely inside L~R

In this case, the palindrome radius of I is the palindrome radius of I ‘.

Make the region C ~d equal to I ‘in the I bit. Since the whole L~R is a palindromic string centered on C, a~b and C ~d must be symmetric about C, and C ~d must be the reverse order of A ~b. A ~ B is a palindrome string, and the reverse order of the palindrome string is also a palindrome string, so C ~ D must be at least as large as A ~ B. Why can’t c~ D be bigger? Need proof.

Set the first character to X and the last character to Y in areas A to B. Set the first character of the c to D area to Z and the last character to K.

Why didn’t I expand my palindrome to a larger area?

There’s only one reason, and that’s X! = Y.

And since X is symmetric with K, Y is symmetric with Z, so X == K, Y == Z.

So the Z! = K, the palindrome region of I can no longer expand.

(2) The second case: part of the palindrome area of I ‘is outside L~R.

In this case, the palindrome radius of I is I ~R.

L is the symmetric point L prime of I prime, R is the symmetric point R prime of I.

L~L ‘and R~R’ must be in reverse order, L~L ‘is a palindrome, because L~L’ is inside L~R, so R~R ‘must also be a palindrome, so the palindrome region of I is at least as large as L~L’. Why can’t R~R ‘be bigger? Need proof.

Set the first character from L to L ‘to X and the last character from L to L’ to Y. R~R ‘is Z, and R~R’ is R.

Since X and Y are both in the palindrome region of I ‘and symmetric about I’, X == Y.

And since Y and Z are both in the palindrome region of C and symmetric with respect to C, Y == Z.

Why didn’t C expand its palindrome to a larger size at that time?

There’s only one reason, and that’s X! = K.

So the Z! = K, the palindrome region of I can no longer expand.

(3) The third case: the left boundary of the palindrome region of I ‘just coincides with L.

At this point, the palindrome radius of I cannot be obtained directly, but the minimum palindrome radius can only be determined as I ~R. To make it bigger or not, you need to expand the match out of position I.

However, a constant acceleration can be designed at this time, because I ~R is the minimum palindrome radius determined, so the match can be directly expanded outward from R.

8. To achieve

public static int plalindrome(String str) {
    if (str == null || str.length() == 0) {
        return -1;
    }

    char[] c = str.toCharArray();

    char[] newC = new char[c.length * 2 + 1];
    newC[0] = The '#';

    // a pointer to c
    int i = 0;
    // a pointer to newC
    int j = 1;

    // Use the auxiliary character # to process the original string
    while (i < c.length && j < newC.length - 1) {
        newC[j ++] = c[i ++];
        newC[j ++] = The '#';
    }

    return process(newC);
}

public static int process(char[] str) {
    R-1 is the right subscript of the palindrome before I
    int R = -1;
    / / center
    int C = -1;

    // Palindrome radius array
    int[] next = new int[str.length];

    for (int i = 0; i < str.length; i ++) {
        // Determine the minimum palindrome radius in all cases
        // If I > R represents the first big case, the lowest palindrome radius is itself 1
        // If I < R, it means the second big case:
        // In the first case, the palindrome radius of I 'is smaller than that of r-i
        // In the second case, the palindrome radius of I 'is larger than that of r-i
        // In the third small case, the palindrome radius of I 'is equal to the palindrome radius of r-i
        next[i] = i < R ? Math.min(next[2 * C - i], R - i) : 1;

        // In either case, try to expand out, because only the first big case and the third small case of the second big case need to expand out, so if it is the second big case
        // The first small case or the second small case will also go to the process of scaling out, but will fail the first time and break out of the loop.
        // This is a unified process to omit multiple if-else, optimizes the code length, and does not affect time complexity
        while (i + next[i] < str.length && i - next[i] >= 0) {
            if (str[i + next[i]] == str[i - next[i]]) {
                next[i] ++;
            } else {
                break; }}// Determine whether R and C need to be updated
        if (i + next[i] > R) {
            // R expands to the right
            R = i + next[i];
            // In this case, I is the center point of the palindrome region of all current characters to the far rightC = i; }}// Sort the next array to find the maximum palindrome radius
    Arrays.sort(next);
    int maxPalindromeRadius = next[str.length - 1];

    // The length of the palindrome radius -1 = the length of the substring whose axis is I
    return maxPalindromeRadius - 1;
}
Copy the code

9. Time complexity

Look at the pseudo code (the reason not to show the code is because the code has been optimized for Coding, which is not conducive to the demonstration of the process) :

public static int process(char[] str) {
    intR = ? ;intC = ? ;int[] next = new int[str.length];

    for (int i = 0; i < str.length; i ++) {
        if(I is outside of R) {expand from I; }else { // I is inside R
            if (iThe palindrome field of 'is entirely in L~R) {// O(1) returns I'Palindrome radius of; }else if (iThe left boundary of the palindrome region of 'is to the left of L) {// the operation O(1) returns r-i; } else { // i'The left boundary of the palindromic region of R coincides with L and expands outward from R; }} sort to get the longest palindrome radius, process into the original palindrome string length, return}Copy the code

The first big case, and the third small case of the second big case need to be expanded out to match, so it must fail once.

The first small case of the second big case and the second small case do not need to expand out of the match, so they fail 0 times.

So the cost of failure to scale is O(N) per location.

According to the pseudocode above, don’t look at failures, only successes. Assuming a string length of N, annotated with I and R as references:

I is increasing, and R is increasing.

We bind the act of expanding to R getting bigger, so every time we expand, R gets bigger, and the total amount of R getting bigger is the number of successful enlargements. And the number of expansion failures, you can estimate a total quantity, is order N.

So the whole time is order N.

10. Extended

Manacher’s algorithm can solve the problem of maximum substring of a palindrome, but it is much more than that. The information of the palindrome radius array next can solve a lot of palindrome problems.

The palindrome radius array is the soul of Manacher algorithm, so be sure to know how to solve the palindrome radius array every time you use Mancher algorithm.