rosalind练习题十七

# Problem

# For positive integers a and n, a modulo n (written amodn in shorthand) is the remainder when a is divided by n. For example, 29mod11=7 because 29=11×2+7.

# Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that a and b are congruent modulo n if amodn=bmodn; in this case, we use the notation a≡bmodn.

# Two useful facts in modular arithmetic are that if a≡bmodn and c≡dmodn, then a+c≡b+dmodn and a×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.

# As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.

# Given: A protein string of length at most 1000 aa.

# Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

# Sample Dataset

# MA

# Sample Output

# 12

# 这道题目是关于模运算和蛋白质翻译的问题。给定一个不超过1000个氨基酸（字母“F”，“L”，“I”，“M”，“V”，“S”，“P”，“T”，“A”，“Y”，“H”，“Q”，“N”，“K”，“D”，“E”，“C”，“W”，“R”，“G”的字符串），求可以翻译成该蛋白质的不同RNA序列数目，答案对1,000,000取模。

# 简单介绍一下蛋白质翻译。RNA是由四种碱基（腺嘌呤、尿嘧啶、胞嘧啶和鸟嘌呤）组成的。氨基酸则由RNA上的三个碱基编码而来，其中有一个“起始密码子”（AUG），还有三个“停止密码子”（UAA，UAG和UGA），分别表示蛋白质链的开端和结束。因此，一个蛋白质可以用RAN序列编码，一个RNA序列也可能被翻译成不同的蛋白质。

# 这里需要用到模运算。模运算是一种基本的整数运算，它给出了除法的余数。例如a mod n（记作amodn）表示a除以n的余数。当比较大的整数进行计算时，可以使用模运算来简化问题。如果两个整数在模n时相等，则它们被称为模n同余。

# 这道题目的思路如下：

# 准备好DNA中20个氨基酸对应的RNA序列表格；

# 对于给定的蛋白质序列，根据RNA序列表格找到所有可能的RNA序列组合；

# 注意考虑到终止密码子；

# 对于所有可能的RNA序列，计算它们的总数，并将结果对1,000,000取模。

# 计算RNA序列数

def count_rna_strings(protein_string):

# 翻译表

codon_table = {

"UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",

"UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",

"UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",

"UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",

"UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",

"UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",

"UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",

"UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",

"UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",

"UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",

"UAA": "*", "CAA": "Q", "AAA": "K", "GAA": "E",

"UAG": "*", "CAG": "Q", "AAG": "K", "GAG": "E",

"UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",

"UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",

"UGA": "*", "CGA": "R", "AGA": "R", "GGA": "G",

"UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"

}

# 统计RNA序列数量

rna_count = 1 # 起始为1，包含终止密码子

for aa in protein_string:

codons_count = list(codon_table.values()).count(aa)

rna_count = (rna_count * codons_count) % 1000000

return rna_count * 3 # 乘上三个终止密码子

# 给出蛋白质序列

protein_string = "MA"

# 输出结果

print(count_rna_strings(protein_string))

rosalind练习题十七

rosalind练习题十七

相关阅读更多精彩内容

友情链接更多精彩内容