# Problem
# For positive integers a and n, a modulo n (written amodn in shorthand) is the remainder when a is divided by n. For example, 29mod11=7 because 29=11×2+7.
# Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that a and b are congruent modulo n if amodn=bmodn; in this case, we use the notation a≡bmodn.
# Two useful facts in modular arithmetic are that if a≡bmodn and c≡dmodn, then a+c≡b+dmodn and a×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.
# As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.
#
# Given: A protein string of length at most 1000 aa.
# Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)
#
# Sample Dataset
# MA
# Sample Output
# 12
# 这道题目是关于模运算和蛋白质翻译的问题。给定一个不超过1000个氨基酸(字母“F”,“L”,“I”,“M”,“V”,“S”,“P”,“T”,“A”,“Y”,“H”,“Q”,“N”,“K”,“D”,“E”,“C”,“W”,“R”,“G”的字符串),求可以翻译成该蛋白质的不同RNA序列数目,答案对1,000,000取模。
# 简单介绍一下蛋白质翻译。RNA是由四种碱基(腺嘌呤、尿嘧啶、胞嘧啶和鸟嘌呤)组成的。氨基酸则由RNA上的三个碱基编码而来,其中有一个“起始密码子”(AUG),还有三个“停止密码子”(UAA,UAG和UGA),分别表示蛋白质链的开端和结束。因此,一个蛋白质可以用RAN序列编码,一个RNA序列也可能被翻译成不同的蛋白质。
# 这里需要用到模运算。模运算是一种基本的整数运算,它给出了除法的余数。例如a mod n(记作amodn)表示a除以n的余数。当比较大的整数进行计算时,可以使用模运算来简化问题。如果两个整数在模n时相等,则它们被称为模n同余。
# 这道题目的思路如下:
# 准备好DNA中20个氨基酸对应的RNA序列表格;
# 对于给定的蛋白质序列,根据RNA序列表格找到所有可能的RNA序列组合;
# 注意考虑到终止密码子;
# 对于所有可能的RNA序列,计算它们的总数,并将结果对1,000,000取模。
# 计算RNA序列数
def count_rna_strings(protein_string):
# 翻译表
codon_table = {
"UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",
"UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",
"UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",
"UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",
"UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",
"UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",
"UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",
"UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",
"UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",
"UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",
"UAA": "*", "CAA": "Q", "AAA": "K", "GAA": "E",
"UAG": "*", "CAG": "Q", "AAG": "K", "GAG": "E",
"UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",
"UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",
"UGA": "*", "CGA": "R", "AGA": "R", "GGA": "G",
"UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"
}
# 统计RNA序列数量
rna_count = 1 # 起始为1,包含终止密码子
for aa in protein_string:
codons_count = list(codon_table.values()).count(aa)
rna_count = (rna_count * codons_count) % 1000000
return rna_count * 3 # 乘上三个终止密码子
# 给出蛋白质序列
protein_string = "MA"
# 输出结果
print(count_rna_strings(protein_string))