- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
Subscribe to see which companies asked this question
1)直接在哈希表中存入10个字符恶毒字符串。
RE?
超过限额内存?
2)用两个Bit保存一个核苷酸,20Bit保存十个字符,用一个int即可。
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Map<Character,Integer> charMap = new HashMap<>();
charMap.put('A',0);
charMap.put('C',1);
charMap.put('G',2);
charMap.put('T',3);
Map<Integer,Integer> seqMap = new HashMap<>();
List<String> result = new ArrayList<String>();
int sum = 0;
for(int i=0; i<s.length(); i++) {
sum = ((sum << 2) + charMap.get(s.charAt(i))) ;
sum = sum & 0xFFFFF;
if(i < 9) continue;
Integer v = seqMap.get(sum);
if(v!=null && v==1){
result.add(s.substring(i-9,i+1));
}
seqMap.put(sum, v!=null?v+1:1);
}
return result;
}
}