rosalind练习题二十五

# Problem

# For a collection of strings, a larger string containing every one of the smaller strings as a substring is called a superstring.

# By the assumption of parsimony, a shortest possible superstring over a collection of reads serves as a candidate chromosome.

# Given: At most 50 DNA strings of approximately equal length, not exceeding 1 kbp, in FASTA format (which represent reads deriving from the same strand of a single linear chromosome).The dataset is guaranteed to satisfy the following condition: there exists a unique way to reconstruct the entire chromosome from these reads by gluing together pairs of reads that overlap by more than half their length.

# Return: A shortest superstring containing all the given strings (thus corresponding to a reconstructed chromosome).

# Sample Dataset

# >Rosalind_56

# ATTAGACCTG

# >Rosalind_57

# CCTGCCGGAA

# >Rosalind_58

# AGACCTGCCG

# >Rosalind_59

# GCCGGAATAC

# Sample Output

# ATTAGACCTGCCGGAATAC

# 这道题是要我们将给定的DNA序列拼接起来，得到一个包含所有序列的最短超级字符串。可用贪心算法。

# 计算字符串之间的重叠部分长度

def overlap(s, t):

max_len =min(len(s), len(t))

for iin range(max_len, 0, -1):

if s[-i:] == t[:i]:

return i

return 0

# 查找下一个要拼接的字符串

def find_next_string(current_string, unused_strings):

overlap_lengths = [overlap(current_string, s)for sin unused_strings]

next_index = overlap_lengths.index(max(overlap_lengths))

return unused_strings.pop(next_index)

# 使用贪心算法拼接字符串

def shortest_superstring(strings):

superstring = strings.pop(0)

while strings:

next_string = find_next_string(superstring, strings)

overlap_len = overlap(superstring, next_string)

superstring += next_string[overlap_len:]

return superstring

# 读取文件

filename ="input7.txt"

with open(filename, 'r')as f:

lines = [line.strip()for linein f.readlines()]

# 提取字符串列表

strings = []

current_string =''

for linein lines:

if line.startswith('>'):

strings.append(current_string)

current_string =''

else:

current_string += line

strings.append(current_string)

# 寻找最短的超级字符串

superstring = shortest_superstring(strings)

# 输出结果

print(superstring)

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成，浏览时请结合常识与多方信息审慎甄别。
平台声明：文章内容（如有图片或视频亦包括在内）由作者上传并发布，文章内容仅代表作者本人观点，简书系信息发布平台，仅提供信息存储服务。

rosalind练习题二十五

rosalind练习题二十五

相关阅读更多精彩内容

友情链接更多精彩内容