Writing in the front

  • Full text of data structure and algorithm

String Matching Problem (KMP algorithm)

Violence matching

public static int violenceMath(String str1, String str2) { char[] s1 = str1.toCharArray(); char[] s2 = str2.toCharArray(); int s1len = s1.length; int s2len = s2.length; int i = 0; int j = 0; while (i < s1len && j < s2len) { if (s1[i] == s2[j]) { i ++; j ++; } else {I = I - (j-1);} else {I = I - (j-1); j = 0; }} if (j == s2len) {// If (j == s2len) {return i-j; } else { return -1; }}Copy the code

KMP algorithm

  • The partial match value is the length of the common element with the longest prefix and suffix
  • Offset from the partial match value, one thing that can be determined is that there are two parts that are currently not being matched successfullyThe part that matches successfullyIf there is overlap at the beginning and end of the substring, it can be taken from the target stringThe part that matches successfullyWe can ensure that there must be a success rate from the overlapping position

For example, the two strings str1ABCDABDDD and str2ABCDABC are actually compared from the beginning. When the position of STR1 [5] is compared with str2[5], it is found that the match is not successful. However, the beginning and end of ABCDAB in STR2 have two repeated strings, and the partial match value is 2. The first condition that can be determined is that the first six of STR1 are equal to the first six of STR2. The second condition is that only the first two and the last two ABCDAB suffixes in STR2 are repeated. Then it is clear that starting from any one of str1ABCD will not succeed because the partial match value is only 2. That is, the string AB cannot be found at any position in BCD of str1. AB can only be found after D. That is, the string length – part matching value has been moved

So if the string is str1ABCDDD and str2ABCDE, which represents the common matching string, and the partial matching value is 0, then it means that the string ABCD in STR1 cannot be matched from any position and the string must be skipped

Code and Explanation

  • KMP search algorithm
/** * @param str1 * @param str2 * @param next Public static int kmpSearch(String str1, String str2, int[] next) {// for (int I = 0, j = 0; i < str1.length(); I++) {// if the value of j is not equal, adjust the value of j. In other words, while (j > 0 && str1.charat (I)!), while (j > 0 && str1.charat (I)!), while (j > 0 && str1.charat (I)! = str2.charAt(j)) { j = next[j - 1]; } if (str1.charAt(i) == str2.charAt(j)) { j++; } if (j == str2.length()) { return i - j + 1; } } return -1; }Copy the code
  • Gets partial match values for each character
Public static int[] kmpNext(String dest) {public static int[] kmpNext(String dest) {public static int[] kmpNext(String dest) {public static int[] kmpNext(String dest) { next[0] = 0; For (int I = 1, j = 0; i < dest.length(); While (j > 0 &&dest. charAt(I)! = dest.charAt(j)) {j = next[j-1]; } if (dest.charat (I) == dest.charat (j)) {// if (dest.charat (I) == dest.charat (j)) { } next[i] = j; } return next; }Copy the code
  • The core codej = next[j-1]The premise is to calculate the matching value by hand

Next [3]=2 next[I]=2 next[I]= 1 next[I]=2 next[I]=2 next[I]=2 next[I]=2 For example, STR [0…1]= STR [1…2]); The subscript +1 that also represents the prefix matching the current position (for example, next[5]=1 means a character with subscript 5, the same as STR [0], i.e., STR [5]= STR [0]), pointing to the next position of the prefix

The second step is to explain the meaning of the backtracking j operation in obtaining partial matching values of each character. The first conclusion is to check whether the character to be checked already has partial prefixes. To illustrate this process, there are two concepts: small prefixes and large prefixes, such as strABABABC, which after the previous comparison is now at I =6, next[0~5]={0,1,2,3,4}, where j=4. For the previous string ABABAB, if STR [6]=A, Next [5]=5, which is obvious, but STR [4]! = STR [5], then it would be reasonable to set j=0, i.e. match from the beginning, but this is not true for CBCC, CBCC b; If STR [2] is not equal to STR [5], then we can prove that we can only go back to j=0. So the key question is, why is it possible to find a small prefix within a large prefix that is equal to the beginning of the character being compared?

Look at the following example in conjunction with step 3

CBCC CBCC b, next[0…7]={0,0,1,1,1,2,3,4} STR [4] ={0,0,1,1, 2,3,4} Next [7]=4. We can be sure that STR [3]= STR [7], and we can compare STR [1] with STR [8]. Next [3]=1 indicates the position after STR [0]. Next [3]=1 indicates the position of the fourth character. The longest length is 1, so STR [3]= STR [0]= STR [7]. Next [3] =next[4-1] =next[4-1] =next[4-1] =next[4-1] By abstraction, as long as m = next[j-1] is non-zero, then STR [m-1] is equal to STR [j-1], and STR [m-1]= STR [i-1] is guaranteed to look for prefixes of the current length of the string

Step 3: Answer the questions in step 2. Next [Y]=X, STR [Y]= STR [x-1]; So suppose next [1] is not zero, and the next [1] = m, at this point represents the STR [1] = STR [m – 1), and the first round of comparison, the comparison is STR [I] and STR [j], when to enter the circulation, represents j > 0, that is to say, the STR [1] = = STR [I – 1), J = next[j-1] = STR [next[j-1]] = STR [I] = STR [m] = STR [I] From the next [I] = n – > STR [I] = STR [n – 1), we can launch next [1] = m – > STR [1] = STR [m – 1), then according to the equal transitivity, we can finally get the STR [m – 1] = STR [1] = STR [I – 1), J = next[J-1]; j = next[j-1]; j = next[j-1]; The main purpose is to check whether the character to be checked already has a partial prefix

The fourth step, explain the meaning of the core code of the KMP search algorithm. Compare the prefixes and suffixes in two strings (str1 and str2). Since I keeps moving backwards, it can be reversed as str1 does not move, str2 moves backwards, and once a character is not matched, I does not move, assuming that STR2 already has part of the same string with STR1, that is, j>0, then offset is needed. Moving j forward to the last bit of the small prefix in the previous three steps, while I does not move, shifts the prefix of STR2 to the suffix of the same string in str1. So if it’s 0, it’s going to offset all of str2, because it’s up, but it’s not going to match up, and it’s not going to be the same string no matter where you start, so align the 0 of str2 with I; For example, two strings are str1ABCCABD… With str2ABCCABE, when matching D and E, we need to move str2 backward. Obviously, we need to move it to C in one breath, that is, the suffix of ABCCAB, so as to ensure that the prefix AB of STR2 matches, and then traverse backwards

Set coverage problem (Greedy Algorithm)

  • In solving the problem, inTake the best or best option at each step, so that the result is the best or optimal algorithm, tends to the optimal solution, through conditions to make their comparison more strict
  • Find the smallest combination of radio stations that can cover all areas
HashMap<String, HashSet<String>> broadcasts = new HashMap<>(); HashSet<String> hashSet1 = new HashSet<>(); HashSet1. Add (" Beijing "); HashSet1. Add (" Shanghai "); HashSet1. Add (" tianjin "); broadcasts.put("k1",hashSet1); HashSet<String> hashSet2 = new HashSet<>(); HashSet2. Add (" Beijing "); HashSet2. Add (" guangzhou "); HashSet2. Add (" shenzhen "); broadcasts.put("k2",hashSet2); HashSet<String> hashSet3 = new HashSet<>(); HashSet3. Add (" chengdu "); HashSet3. Add (" Shanghai "); HashSet3. Add (" hangzhou "); broadcasts.put("k3",hashSet3); HashSet<String> hashSet4 = new HashSet<>(); HashSet4. Add (" Shanghai "); HashSet4. Add (" tianjin "); broadcasts.put("k4",hashSet4); HashSet<String> hashSet5 = new HashSet<>(); HashSet5. Add (" hangzhou "); HashSet5. Add (" dalian "); broadcasts.put("k5",hashSet5); HashSet<String> allAddresses = new HashSet<>(); Tunes. forEach((s, strings) -> strings.foreach (s1 -> alladdresses.add (s1)))); System.out.println(allAddresses); ArrayList<String> selects = new ArrayList<>(); HashSet<String> tempSet = new HashSet<>(); // Selects String maxKey = NULL if maxkey is not null; while (allAddresses.size() ! = 0) {// If maxKey is not set to 0, not all regions are covered. Otherwise, the intersection will be greater than 0, but less than the number of regions covered by the broadcast station to which maxKey points. MaxKey = NULL; Tunation for (String key: tunation. KeySet ()) {// Each time for, clear the intersection in order to get the next intersection tempset.clear (); HashSet<String> addresses = broadcasts. Get (key); tempSet.addAll(addresses); RetainAll (allAddresses); // Assign the intersection of tempSet and allAddresses to tempSet to obtain the current key that can override the region tempSet. RetainAll (allAddresses); //maxKey == null indicates that no suitable radio station has been found. // tempset.size () > tunes.get (maxKey).size() More than the last key's location intersects with allAddresses, Purpose is to choose the optimal solution of each traversal if (tempSet. The size () > 0 && (maxKey = = null | | tempSet. The size () > broadcasts. Get (maxKey). The size ())) {maxKey = key; } // If (maxKey! = null) { selects.add(maxKey); Alladdresses.removeall (broadcasts. Get (maxKey)); } } System.out.println(selects);Copy the code