rosalind练习题十四

# Problem

# A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

# Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

#

# Given: A collection of k (k≤100) DNA strings of length at most 1 kbp each in FASTA format.

# Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

#

# Sample Dataset

# >Rosalind_1

# GATTACA

# >Rosalind_2

# TAGACCA

# >Rosalind_3

# ATACA

# Sample Output

# AC

# 题目要求我们找到一个 DNA 字符串集合中的最长公共子串,即motif。

def find_common_motif(strings):

    shortest_string = min(strings, key=len)

    for length in range(len(shortest_string), 0, -1):

        for start in range(len(shortest_string) - length + 1):

            substring = shortest_string[start:start + length]

            if all(substring in string for string in strings):

                return substring

    return ""

# 读入数据

strings = []

with open("input4.txt", "r") as f:

    current_string = ""

    for line in f:

        if line.startswith(">"):  # FASTA格式下一行为数据描述,跳过

            continue

        current_string += line.strip()

        if current_string[-1] == "":  # 处理最后一个字符为空格的情况

            current_string = current_string[:-1]

        strings.append(current_string)

        current_string = ""

# 寻找最长公共子串并打印输出

result = find_common_motif(strings)

print(result)

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容