requirements

All DNA is made up of A series of nucleotides abbreviated to ‘A’, ‘C’, ‘G’ and ‘T’, such as’ ACGAATTCCG ‘. When studying DNA, it can sometimes be very helpful to identify repeated sequences in DNA.

Write a function to find all target substrings of length 10 that occur more than once in the DNA string s.

Example 1:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" output: [" AAAAACCCCC ", "CCCCCAAAAA"]Copy the code

Example 2:

Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]Copy the code

The core code

class Solution:
    def findRepeatedDnaSequences(self, s: str) - >List[str] :
        from collections import defaultdict
        n = len(s)
        lookup = defaultdict(int)
        for i in range(0,n-9):
            lookup[s[i:i+10]] + =1
        return [key for key,value in lookup.items() if value > 1]
Copy the code

Another solution

class Solution:
    def findRepeatedDnaSequences(self, s: str) - >List[str] :
        repeatset = set()
        res = set(a)for i in range(0.len(s)-9) :if s[i:i+10] in repeatset:
                res.add(s[i:i+10])
            else:
                repeatset.add(s[i:i+10])
        return list(res)
Copy the code

In the first solution, we iterate the string linearly, starting with the first character, taking out 10 characters at a time to form a string and putting them in a dictionary. Finally, if the value in the dictionary exceeds 1, we output the answer to the time complexity O(n), space complexity O(n). The second solution: we use two sets. If they exist in repeatset, the number of times will be more than twice in the second scan, which will be added to our answer list and output.