LeetCode HOT 100 (05, Regular expression matching)

Not good enough, hair amount is still much, dental laboratories, can become Buddha.

The importance of algorithm is self-evident. No matter you are a researcher or a popular IT worker recently, you should have certain algorithm ability, which is also a necessary part of the interview. The demonstration of algorithm foundation can often make the interviewer’s eyes shine, which is also an important factor to stand out from most competitors.

However, most people pay more attention to their practical operation ability, focusing on the realization of functions, but ignore the improvement of algorithm ability. Sometimes different algorithms are used to solve the same problem, but the difference in operation efficiency is quite large. After all, we still need to think from the perspective of the customer, and it would be best to bring the user more extreme experience.

All methods are empty, cause and effect is not empty. Taoye was also reluctant to spend too much time on algorithms before, and the algorithm foundation is quite weak. It was not until I entered a new stage of learning that I began to realize the fact of my “food” in the face of various “torture” from my tutor and comparison with others around me.

This topic is the sixth question of LeeTCode HOT 100, the difficulty belongs to the difficulty, mainly examine is regular matching problem.

I feel there is something in this problem, and it took me a lot of time to understand it.

We all know that the regular expression is mainly used for character matching, in reptiles, we will send a request to the server and get the corresponding results, through the certain way in response to the results that we need to extract the target data, and one way you can parse through regular table expression.

The difficulty of this problem is still a little, let me more realize the algorithm “which makes perfect” this feature, this is like brush math problem, the topic has seen the idea, not seen completely unable to start. This problem is the same, mainly through the dynamic programming algorithm to solve, of course, there are other algorithms available, this paper mainly uses the dynamic programming algorithm.

Now, let’s look at this problem.

Title: Regular expression matching

Given a string s and a character rule P, implement a regular expression match that supports ‘.’ and ‘*’.

  • ‘.’ matches any single character
  • The ‘*’ matches zero or more of the preceding elements

By matching, you want to cover the entire string S, not parts of the string.

Source: LeetCode link: leetcode-cn.com/problems/re…

The sample

  • Example 1

Input: s = “aa” p = “a” Output: false Description: “A” cannot match the entire string “aa”.

  • Example 2

Input: s = “aa” p = “a*” Output: true Thus, the string “aa” can be treated as ‘a’ repeated once.

  • Example 3

Input: s = “ab” p = “.” Output: true Description: “.” Can match zero or more (‘*’) characters (‘.’).

  • Example 4

Input: s = “aab” p = “CAB” Output: true Explanation: Since ‘*’ means zero or more, where ‘c’ is zero, ‘a’ is repeated once. So it matches the string “aab”.

  • Example 5

Input: s = “Mississippi” p = “misisp*.” output: false

Train of thought

As mentioned above, regular expressions are often used in crawlers, so there must be a specific module to call, and we can use the RE module to achieve this function:

class Solution(object):
    def isMatch(self, s, p):
        return True if re.match(p+'$',s) else False
Copy the code

The requirement has been implemented, but it is clear that we need to implement this functionality manually rather than calling a third-party module.

Before solving this problem through dynamic programming, it is important to understand what problems dynamic programming can solve.

  • Counting problems, like how many ways can you go from the top left to the bottom right in a matrix. (Each walk direction can only be down or right)
  • Find the maximum value problem, for example, if there are three kinds of coins, the face value is 2 yuan, 3 yuan, 7 yuan respectively, how many coins can 27 yuan be composed of the minimum
  • An existential problem, such as this one, is whether a match can be made in the target string

Of course, we certainly can’t generalize. Specific problems, specific analysis, or according to the actual situation to determine whether the target problem can be solved by dynamic programming.

To use dynamic programming in a problem, we generally need four steps:

  1. Determine the state (last step, subproblem). To solve dynamic programming, you need to define an array. To determine the state, you need to know what each element of the array dp[I] or dp[I][j] stands for.
  2. The transfer equation is to obtain the subproblem of the target problem through the subproblem of the previous step
  3. Initial conditions and boundary cases. Dynamic programming usually uses the previous node to get the value of the next node, so we need to specify its initial conditions and the boundary conditions of the array DP. It’s like induction, or a recursive formula in a sequence.
  4. The order of calculation is what is the last step to achieve before the requirements are realized

This may confuse you a little, but here are some examples to explain:

Example 1: there are three kinds of coins, the face value is 2 yuan, 3 yuan, 7 yuan, how many coins can 27 yuan be composed of

Determining the state (the last step, subproblem) : We know that the problem needs to know the minimum number of coins that can be used. We need to define an array dp of length 27, and dp[I] represents the minimum number of coins needed for the total. Assuming that the last coin has a face value of AKA_kak and requires at least KKK coins, then excluding the total value of the last coin is 27− AK27-A_K27 − AK and the number of coins in front of it, K − 1K-1K −1, the face value should also be the least. Therefore, we converted the original problem into a sub-problem: how many coins can be used to form 27− AK27-A_K27 − AK elements (K − 1K-1K −1)?

Transfer equation: From the above analysis, we can obtain the following recursive formula (transfer equation), which represents the minimum number of coins required to remove the total face value of the last coin + 1


f ( x ) = m i n { f ( x 2 ) . f ( x 5 ) . f ( x 7 ) } + 1 f(x) = min\{f(x-2), f(x-5), f(x-7)\} + 1

Initial conditions and boundary cases: The initial condition is f(0), which indicates that 0 dollar can be made up of 0 coins, and the boundary case is that the total denomination value should be greater than 0 each time the total denomination value is reduced by 2, 5, and 7, otherwise the target value cannot be formed.

Order of calculation: To obtain f(x), the value of f(x-2), F (X-5), f(x-7) must be obtained, which is the minimum number of coins required to remove the denomination of a coin

Through the above analysis, the following Java algorithm can be obtained.

public int coinChange(int[] coin_list, int coin_value) { int[] f = new int[coin_value + 1]; f[0] = 0; for (int i = 1; i <= coin_value; i++) { f[i] = Integer.MAX_VALUE; for (int j = 0; j < coin_list.length; j++) { if (i >= coin_list[j] && f[i - coin_list[j]] ! = Integer.MAX_VALUE) { f[i] = Math.min(f[i], f[i - coin_list[j]] + 1); } } } if (f[coin_value] == Integer.MAX_VALUE) { f[coin_value] = -1; } return f[coin_value]; }Copy the code

** Example 2: ** How many ways can you go from the top left to the bottom right in a matrix? (Each walk can only go down or right), for example, from (0, 0), how many routes to (4, 5)

Time relation, here is a simple analysis.

There are two ways to get to the upper lattice of (4, 5), one is (3, 5) and the other is (4, 4), so the total number of ways to get to (4, 5) is equal to the sum of the above two ways. For this, we can get the following relation, where f[I][j] represents the number of ways to get to (I, j)


f [ i ] [ j ] = f [ i 1 ] [ j ] + f [ i ] [ j 1 ] f[i][j] = f[i-1][j] + f[i][j-1]

The Python implementation code is as follows:

def calc_run_number(m, n):
    import numpy as np
    f = np.zeros([m, n])
    for i in range(m):
        for j in range(n):
            if (i ==0 or j == 0): f[i][j] = 1
            else: f[i, j] = f[i-1][j] + f[i][j-1]
    return f[m-1][n-1
Copy the code

Resemble similar case still has a lot of actually, basically still want much contact ability to go.

Now, let’s go back to the regular matching problem and look at the problem again:

Given a string s and a character rule P, implement a regular expression match that supports ‘.’ and ‘*’.

Matches any single character

  • Matches zero or more of the preceding elements

By matching, you want to cover the entire string S, not parts of the string.

As mentioned earlier, before we use dynamic programming we need to define an array to store the state of the data, and this problem involves two strings, S and P, so we need to define a two-dimensional array, which is the matrix DP. If s and P are s_len and p_len respectively, then the shape value of dp array should be (s_len +1, p_len +1), where +1 is mainly because of the need to introduce an extra row and column, which is used to represent the matching condition of null character “”. Here we can understand as the introduction of initial value.

Dp [I][j] indicates whether s[: I] can be matched by dp[:j] regular expression. The value is True or False. True means a match succeeds, for examples:abc p:a.cFalse means no match, for examples:abc p:ab*This is very important to understand and one of the keys to solving this problem.

Dp [0][0]=True, because null character s can be matched by null character p:

dp = [[False] * (p_len + 1) for temp in range(s_len + 1)]; dp[0][0] = True
Copy the code

Then, additional processing is required for the first line of dp, which is the definition of the initial state. If a character before * is present, we need to define the dp value at that position to be the same as the first two values, because in the regular expression, * means that zero or more of the preceding elements can be matched (note: This is defined on the first line, which means that s can be treated as a “” null character, which is matched by the p[:j] regular) :

for one_row_item in range(p_len):
    if p[one_row_item] == '*' and dp[0][one_row_item - 1]:
        dp[0][one_row_item + 1] = True
Copy the code

After the first line is processed, dp values of other lines need to be filled, and the filling is based on the classification judgment of the previous line: assume that the traversal conditions meet: P [J-1] == S [i-1] or P [J-1] == ‘.’, that is to say, s and P have the same corresponding characters, or can pass. Dp [I][j] = dp[i-1][j-1]; dp[I][j] = dp[i-1][j-1]; If the regular character is *, dp[I][j] needs to be jointly determined according to dp[I][j-2], dp[I][j-1] and dp[i-1][j]. The complete algorithm idea is as follows:

class Solution: def isMatch(self, s: str, p: str) -> bool: s_len, p_len = len(s), len(p) dp = [[False] * (p_len + 1) for temp in range(s_len + 1)]; dp[0][0] = True for one_row_item in range(p_len): if p[one_row_item] == '*' and dp[0][one_row_item - 1]: dp[0][one_row_item + 1] = True for i in range(1, s_len + 1): for j in range(1, p_len + 1): if p[j - 1] == s[i - 1] or p[j - 1] == '.': dp[i][j] = dp[i - 1][j - 1] elif p[j - 1] == '*': if p[j - 2] ! = s[i - 1]: dp[i][j] = dp[i][j - 2] if p[j - 2] == s[i - 1] or p[j - 2] == '.': dp[i][j] = dp[i][j - 2] or dp[i][j - 1] or dp[i - 1][j] return dp[s_len][p_len]Copy the code

In general, this problem is still difficult, at least it is the most difficult one so far, but it also took a lot of time to understand, and it is difficult to describe through words to understand, mainly through reading algorithm code to understand its core ideas.

Problems like this are solved by dynamic programming algorithms that require more practice.

I am Taoye, love study, love to share, is keen on all kinds of technology, the study of anime like playing chess, listening to music, chat, hoping to worlds to record your growth process as well as the life intravenous drip, also hope to be able to strong more within the circle of like-minded friends, more welcome visiting WeChat princess: cynicism Coder.

Recommended reading:

LeetCode HOT 100 (00, sum of two numbers) LeetCode HOT 100 (01, sum of two numbers) LeetCode HOT 100 (02, the oldest string without repeating characters) LeetCode HOT 100 (03, Find the median of two positive ordinal groups.

References:

[1] LeetCode official: leetcode-cn.com/problems/re…

[2] the dynamic programming algorithm: www.bilibili.com/video/BV1xb…