Topic:

Given two strings, find their longest common string

var str1="abcdefg";
var str2="xyzabcd";Copy the code

Explanation: For example, in the words “abcdefg” and “abcdefg” their longest common subsequence is “abcd”. The search for firstborn sequences is often used in genetics to describe DNA using the first letter of a nucleotide base.

The difference between the longest common substring and the longest common subsequence.

The difference between the longest common subsequence and the longest common subsequence is as follows: a subsequence is a continuous part of a string, while a subsequence never changes the sequence order, but removes any element from the sequence to obtain a new sequence; That is, the position of the characters in the substring must be continuous, and the subsequence need not be.

This problem, thought for a long time, finally slowly made the topic, the following content may be difficult to understand, maybe I am more stupid, at least for me to think for a long time. I will try to combine pictures and texts to show the idea of solving the problem, or I can think about it by myself and then read the following content.

Brute force

In fact, seeing this problem we can directly use violence to solve the problem. Given two strings A and B, we can find the longest common string by comparing the first character of A with each corresponding character of B. If no matching letter is found, move to the second character of A, then compare from the first character of B, and so on. Because the following content is more, I will not write the explosion method here.

Dynamic programming

Let’s review the solution idea of dynamic programming:

  1. Start at the bottom, solve all the small problems, and then combine them into an overall solution.
  2. Use an array to create a table of solutions that are broken down into subproblems.

So, based on these two points, the first thing we have to do is break down this problem. How do we break it down?

Step 1: The minimum is per character, so break it down into individual characters to match each individual character. So here we get the following table:

Match is 1, no match is 0, at which point we break it down into matches per character, and we get.

Step 2: When we have each subproblem, we need to create an array to store the solutions of each subproblem. The key question is, how can we store the solution of each subproblem and get the result we want. Do some processing on the table and initialize a two-dimensional array:

var arr = new Array(str1.length + 1); for (var i = 0; i <= str1.length + 1; i++) { arr[i] = new Array(str2.length + 1); for (var j = 0; j <= str2.length + 1; j++) { arr[i][j] = 0; }}Copy the code

The following table is obtained:

We can see that we’re adding one more row and one more column with zero, just to see if the last one is equal, and you’ll see what that means later.

Do some processing on the initialization of the two-dimensional array, resulting in the following figure:

This graph may seem confusing at first, but if you already understand it, you can skip this nonsense. We should always tell ourselves that this new array is the solution for each subproblem, and the first thing we need to figure out is the longest string, what are the ways we can cut the string from the original string, so we need to know the starting position and the length of the string. As you can see from the figure, the scarlet letter is the optimal length of the solution string to store the location, so two variables should be created to store the location’s index and the maximum length maxLen. Here’s the code:

var maxLen = 0; var index = 0; for(var i = 0; i <= str1.length; i++){ for(var j = 0; j <= str2.length; j++){ if(i == 0 || j == 0){ arr[i][j] = 0 }else{ if (str1[i] == str2[j] && str1[i - 1] == str2[j - 1]) { arr[i][j] = arr[i - 1][j - 1] + 1; }else{ arr[i][j] = 0; } } if(arr[i][j] > maxLen){ maxLen = arr[i][j]; index = i; }}}Copy the code

So just to make it simple, we’re going to specify the position of a continuous string, which is the value on the diagonal of a two-dimensional array, and that makes a lot of sense.

The above code finds the two main arguments, so the next step is to select one of the strings to intercept the string:

var str = "";
if(maxLen == 0){
    return "";
}else{
    for(var k = index - maxLen; k < maxLen; k++){
        str += str1[k];
    }
    return str;
}Copy the code

The complete code is posted below

function LCS(str1, str2){
    var maxLen = 0;
    var index = 0;

    var arr = new Array();
    for (var i = 0; i <= str1.length + 1; i++) {
        arr[i] = new Array();
        for (var j = 0; j <= str2.length + 1; j++) {
            arr[i][j] = 0;
        }
    }

    for(var i = 0; i <= str1.length; i++){
        for(var j = 0; j <= str2.length; j++){
            if(i == 0 || j == 0){
                arr[i][j] = 0
            }else{
                if (str1[i] == str2[j] && str1[i - 1] == str2[j - 1]) {
                    arr[i][j] = arr[i - 1][j - 1] + 1;
                }else{
                    arr[i][j] = 0;
                }
            }
            if(arr[i][j] > maxLen){
                maxLen = arr[i][j];
                index = i;
            }
        }
    }

    var str = "";
    if(maxLen == 0){
        return "";
    }else{
        for(var k = index - maxLen; k < maxLen; k++){
            str += str1[k];
        }
        return str;
    }
}
var str1="abcdefg";
var str2="xyzabcd";
LCS(str1, str2)     // abcdCopy the code

At this point we’ve finally found the longest public string. At this point we need to think about whether we can optimize again, although this solves the problem, but introduced a two-dimensional array, in the case of two large strings is not cost-effective, can we further optimize? The answer is yes, need to change the strategy, can you build a one-dimensional array can solve the problem, here too late to write so much, then I will post in the public account violent solution and dynamic programming optimization solution, welcome to continue to pay attention to my public account for first-hand information.