187. Repeated DNA Sequences

问题

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

例子

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].

分析

首先要明确，一个长度为n的字符串有且仅有n-9个长度为10的子字符串。
定义两个unordered_set: allSet, findSet. allSet用来保存所有遍历过的长度为10的子字符串，
findSet用来保存已经找到的出现次数不小于2次的长度为10的子字符串。findSet是用来防止结果列表里出现重复的子字符串。

要点

防止检测到重复的子字符串；
vector的构造函数可以传入unordered_set的首尾迭代器（迭代器实在是太强大了！）。

时间复杂度

O(n)

空间复杂度

O(n)

代码

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        if (s.length() < 11) return vector<string>();

        unordered_set<string> allSet, findSet;
        for (int i = 0; i <= s.length() - 10; i++) {
            string substring = s.substr(i, 10);
            if (allSet.find(substring) != allSet.end() && 
                findSet.find(substring) == findSet.end())
                findSet.insert(substring);
            else
                allSet.insert(substring);
        }
        
        return vector<string>(findSet.begin(), findSet.end());
    }
};

最后编辑于：2017.12.07 02:41:23

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

187. Repeated DNA Sequences