requirements
All DNA is made up of A series of nucleotides abbreviated to ‘A’, ‘C’, ‘G’ and ‘T’, such as’ ACGAATTCCG ‘. When studying DNA, it can sometimes be very helpful to identify repeated sequences in DNA.
Write a function to find all target substrings of length 10 that occur more than once in the DNA string s.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT" output: [" AAAAACCCCC ", "CCCCCAAAAA"]Copy the code
Example 2:
Input: s = "AAAAAAAAAAAAA" Output: ["AAAAAAAAAA"]Copy the code
The core code
class Solution:
def findRepeatedDnaSequences(self, s: str) - >List[str] :
from collections import defaultdict
n = len(s)
lookup = defaultdict(int)
for i in range(0,n-9):
lookup[s[i:i+10]] + =1
return [key for key,value in lookup.items() if value > 1]
Copy the code
Another solution
class Solution:
def findRepeatedDnaSequences(self, s: str) - >List[str] :
repeatset = set()
res = set(a)for i in range(0.len(s)-9) :if s[i:i+10] in repeatset:
res.add(s[i:i+10])
else:
repeatset.add(s[i:i+10])
return list(res)
Copy the code
In the first solution, we iterate the string linearly, starting with the first character, taking out 10 characters at a time to form a string and putting them in a dictionary. Finally, if the value in the dictionary exceeds 1, we output the answer to the time complexity O(n), space complexity O(n). The second solution: we use two sets. If they exist in repeatset, the number of times will be more than twice in the second scan, which will be added to our answer list and output.