472. Conjunctions: Sequential DP (String hash optimization) word problem

Topic describes

This is the 472. Connectives in LeetCode. Difficulty is difficult.

Tag: “string hash”, “sequence DP”

Given a string array of words without repeating words, find and return all the connectives in the words.

A conjunction is defined as a string consisting entirely of at least two shorter words from a given array.

Example 1:

Input: words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"] ["catsdogcats","dogcatsdog","ratcatdogcat"] Explanation: "catsdogcats" consists of "cats", "dogs" and "cats"; "Dogcatsdog" consists of "dog", "cats" and "dog"; "Ratcatdogcat" consists of "rat", "cat", "dog" and "cat".Copy the code

Example 2:

Input: words = ["cat","dog","catdog"] Output: ["catdog"]Copy the code

Tip:

$1 <= words.length <= 10^4$
$0 <= words[i].length <= 1000$
Words [I] Words [I] Words [I] are lowercase letters only
$0 <= sum(words[i].length) <= 10^5$

Sequence DP + string hash

Given an array of wordsWordswords, consider how to determine if a s=words[I]s=words[I]s=words[I] is a connective.

For convenience, we call each of the connected parts that make up S item.

For example 🌰, for example s = ABC, whose possible combinations of items are a and BC.

Dynamic programming can be used to determine whether a single string is a conjunction: define f[I]f[I] as the maximum number of items that can be sliced considering the first iii characters of S (let the subscript start at 111).

The reason why “record f[I]f[I]f[I] is the largest partition item (int type dynamic gauge array)” is used instead of “record f[I]f[I]f[I] is whether it can be composed of multiple items (bool type dynamic gauge array)”. Because each s=words[I]s=words[I]s=words[I]s=words[I] can at least be composed of itself, if bool is used to record the state, the final f[n]f[n]f[n] must be True, and the last state needs to be processed extra, so it is better to record the maximum number of segmentation. If s is a connecter, f[n]>1f[n] >1f[n] >1.

If f[I]f[I]f[I]f[I] can be transferred from f[j]f[j]f[j] f[j]f[j]f[j] (where j

The complexity of enumeration III and JJJ has gone to O(n2)O(n^2)O(n2). If the existence of a string is determined by data structures such as HashMap, the characters need to be traversed when the hash function is executed. The overall complexity has gone to O(n3)O(n^3)O(n3). TLE.

We use “string hashing” to optimize whether a substring exists $words$ In the.

For example, if s=words[I]s=words[I]s=words[I]s=words[I]s= Words [I] is a link word, the hash value of each word [I] is preprocessed. And put it into a HashSet, so that we can turn the problem of “determining whether a substring exists in Wordswordswords” into “determining whether a value exists in a Set.”

And since we are calculating the hash of a certain substring S, we are going to process each bit of s[I]s[I] post-processing, so when we transfer f[I]f[I]f[I]f[I], we expect to process the substring from front to back, I – this is regular from [0, 1], [0, 1] I [0, I – 1] range for transferable point f [j] [j] f f [j] can’t do it.

So we adjust the transfer logic as: From f [I] [I], [I] f f enumeration range [I + 1, n] [I + 1, n] [I + 1, n), found by f [I] [I] f f [I] can update the status of the f [j] [j] f f [j], And try to update f[j]f[j]f[j] f[j] with f[I]f[I]f[I]. The transfer equation is:

f[j] = \max(f[j], f[i] + 1)

Of course, f[I]f[I]f[I]f[I] is valid, and the substring s[(I +1),j]s[(I +1),j]s[(I +1),j]s[(I +1),j] is in The WordsWordslist.

Some details: For convenience, we define f[I]=−1f[I]= -1f[I]=−1 as invalid;

Because the string hash will produce hash collision, here at the time of computing the hash value, modify the hash calculation (extra added a OFFSET), the purpose is to want to AC in front of my computer without electricity, and the other a more reliable way is to use double hash, or to simply record a string which corresponds to the hash value.

Code:

class Solution {
    Set<Long> set = new HashSet<>();
    int P = 131, OFFSET = 128;
    public List<String> findAllConcatenatedWordsInADict(String[] words) {
        for (String s : words) {
            long hash = 0;
            for (char c : s.toCharArray()) hash = hash * P + (c - 'a') + OFFSET;
            set.add(hash);
        }
        List<String> ans = new ArrayList<>();
        for (String s : words) {
            if (check(s)) ans.add(s);
        }
        return ans;
    }
    boolean check(String s) {
        int n = s.length();
        int[] f = new int[n + 1];
        Arrays.fill(f, -1);
        f[0] = 0;
        for (int i = 0; i <= n; i++) {
            if (f[i] == -1) continue;
            long cur = 0;
            for (int j = i + 1; j <= n; j++) {
                cur = cur * P + (s.charAt(j - 1) - 'a') + OFFSET;
                if (set.contains(cur)) f[j] = Math.max(f[j], f[i] + 1);
            }
            if (f[n] > 1) return true;
        }
        return false; }}Copy the code

Time complexity: let $n$ 为 $words$ Array length, $N = \sum_{i = 0}^{n – 1}words[i].length$ According to the data range $N$ The maximum is $1e5$ . Pretreatment of theSetIs the complexity of $O(N)$ ; To all $words[i]$ performcheckOperation, the complexity is $O((words[i].length)^2)$ , the maximum value of total computation is $O(N^2)$ Because of pruning, the amount of calculation can not be achieved
Space complexity: O (n + Max ⁡ (words [I] length)) O (n + \ Max (words [I] length)) O (n + Max (words [I] length))

The last

This is the No.472 of our “Brush through LeetCode” series, which started on 2021/01/01. There are 1916 topics in LeetCode as of the start date, some of which have locks, we will first finish all the topics without locks.

In this series of articles, in addition to explaining how to solve the problem, I will also present the most concise code possible. If a general solution is involved, the corresponding code template will also be used.

In order to facilitate the students to debug and submit the code on the computer, I set up the relevant warehouse: github.com/SharingSour…

In the repository, you’ll see links to the series of articles, the corresponding code for the series of articles, the LeetCode source links, and other preferred solutions.

472. Conjunctions: Sequential DP (String hash optimization) word problem

Topic describes

Sequence DP + string hash

The last

Related Posts

Bean life cycle

Leetcode 682. Baseball Game (Python)

Network Programming learning 8– Closing TCP connections (close, shutdown function)